โ† Back|CLOUD-DEVOPSโ€บSection 1/18
0 of 18 completed

Deploy AI app

Intermediateโฑ 15 min read๐Ÿ“… Updated: 2026-02-17

Introduction

AI model build pannita โ€” congratulations! ๐ŸŽ‰ But adha unga laptop la mattum run pannina yaarukku use? World ku deploy pannanum!


"It works on my machine" โ€” idhu development la dhaan seri. Production la reliable, scalable, secure ah run aaganum.


Indha article la oru AI app ah zero to production deploy panna complete guide paapom โ€” Docker build, cloud deploy, domain setup, monitoring varaikkum! ๐Ÿš€

Deployment Overview

AI app deploy panna steps overview:


StepWhatTools
1๏ธโƒฃApp prepareFastAPI/Flask, requirements.txt
2๏ธโƒฃContainerizeDocker, Dockerfile
3๏ธโƒฃTest locallyDocker Compose
4๏ธโƒฃPush imageDocker Hub / GCR / ECR
5๏ธโƒฃDeploy to cloudCloud Run / ECS / K8s
6๏ธโƒฃDomain setupCustom domain, SSL
7๏ธโƒฃCI/CD pipelineGitHub Actions
8๏ธโƒฃMonitoringLogging, alerts

Oru oru step ah detailed ah paapom! ๐Ÿ“‹

Step 1: App Structure

Production-ready AI app structure:


code
ai-chatbot/
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py          # FastAPI app
โ”‚   โ”œโ”€โ”€ model.py         # AI model loading
โ”‚   โ”œโ”€โ”€ schemas.py       # Request/Response models
โ”‚   โ””โ”€โ”€ config.py        # Settings
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ model.onnx       # AI model file
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_api.py
โ”‚   โ””โ”€โ”€ test_model.py
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ .env.example
โ”œโ”€โ”€ .dockerignore
โ””โ”€โ”€ README.md

Key points:

  • FastAPI > Flask for production (async, auto-docs, validation)
  • ONNX format โ€” faster inference than raw PyTorch/TF
  • .env.example โ€” document required environment variables
  • tests/ โ€” deploy pannradhu ku munnaadi test pannunga!

Step 2: Production Dockerfile

Optimized Dockerfile for AI apps:


dockerfile
# Multi-stage build โ€” smaller final image
FROM python:3.11-slim as builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

FROM python:3.11-slim

WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .

# Non-root user (security)
RUN useradd -m appuser
USER appuser

ENV PATH=/root/.local/bin:$PATH
ENV PORT=8000

EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=5s \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Optimization tips:

  • Multi-stage build โ€” final image size 50-70% reduce aagum
  • --no-cache-dir โ€” pip cache remove, image smaller
  • Non-root user โ€” security best practice
  • .dockerignore โ€” unnecessary files exclude pannunga

Step 3: Local Testing

โœ… Example

Deploy pannradhu ku munnaadi local la test pannunga!

bash
# Build
docker build -t ai-chatbot:v1 .

# Run
docker run -p 8000:8000 --env-file .env ai-chatbot:v1

# Test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello AI!"}'

Docker Compose for full stack:

yaml
services:
  api:
    build: .
    ports: ["8000:8000"]
    env_file: .env
  redis:
    image: redis:alpine
    ports: ["6379:6379"]

Local la work aana dhaan cloud la deploy pannunga! โœ…

Step 4-5: Cloud Deployment

Google Cloud Run โ€” easiest option for AI apps:


bash
# 1. Google Cloud CLI install & login
gcloud auth login
gcloud config set project my-ai-project

# 2. Container Registry ku push
gcloud builds submit --tag gcr.io/my-ai-project/ai-chatbot:v1

# 3. Deploy to Cloud Run
gcloud run deploy ai-chatbot \
  --image gcr.io/my-ai-project/ai-chatbot:v1 \
  --platform managed \
  --region asia-south1 \
  --memory 2Gi \
  --cpu 2 \
  --min-instances 0 \
  --max-instances 10 \
  --set-env-vars "MODEL_PATH=models/model.onnx"

# 4. URL kedaikum!
# https://ai-chatbot-xyz-as.a.run.app

Cloud Run advantages:

  • Scale to zero โ€” no traffic = no cost ๐Ÿ’ฐ
  • Auto-scaling โ€” traffic based
  • HTTPS automatic
  • Pay per request

Production AI App Architecture

๐Ÿ—๏ธ Architecture Diagram
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        PRODUCTION AI APP ARCHITECTURE             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                    โ”‚
โ”‚  ๐Ÿ‘ค Users                                          โ”‚
โ”‚    โ”‚                                               โ”‚
โ”‚    โ–ผ                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
โ”‚  โ”‚ CloudFlare   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Cloud Run /   โ”‚           โ”‚
โ”‚  โ”‚ CDN + DNS    โ”‚     โ”‚ Load Balancer โ”‚           โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
โ”‚                               โ”‚                    โ”‚
โ”‚                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚
โ”‚                    โ–ผ          โ–ผ          โ–ผ         โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚              โ”‚ API    โ”‚ โ”‚ API    โ”‚ โ”‚ API    โ”‚     โ”‚
โ”‚              โ”‚ Pod 1  โ”‚ โ”‚ Pod 2  โ”‚ โ”‚ Pod 3  โ”‚     โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                  โ”‚          โ”‚          โ”‚           โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”      โ”‚
โ”‚              โ”‚        Shared Services       โ”‚      โ”‚
โ”‚              โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค      โ”‚
โ”‚              โ”‚Redis โ”‚ Cloud   โ”‚ Cloud      โ”‚      โ”‚
โ”‚              โ”‚Cache โ”‚ Storage โ”‚ SQL/DB     โ”‚      โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ”‚                                                    โ”‚
โ”‚  ๐Ÿ“Š Monitoring: Cloud Monitoring + Logging         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 6: Custom Domain & SSL

Professional URL setup:


Cloud Run custom domain:

bash
# Map domain
gcloud run domain-mappings create \
  --service ai-chatbot \
  --domain api.myaiapp.com \
  --region asia-south1

DNS setup (domain provider la):

TypeNameValue
CNAMEapighs.googlehosted.com
A@Cloud Run IP

SSL โ€” Cloud Run automatic HTTPS kudukum! Free! ๐Ÿ”’


Cloudflare recommend pannrom:

  • Free SSL
  • DDoS protection
  • CDN caching
  • Analytics
  • Nambha AI app ku extra security layer! ๐Ÿ›ก๏ธ

Step 7: CI/CD with GitHub Actions

Every push la auto-deploy โ€” GitHub Actions:


yaml
# .github/workflows/deploy.yml
name: Deploy AI App

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Tests
        run: |
          pip install -r requirements.txt
          pytest tests/

      - name: Auth to GCP
        uses: google-github-actions/auth@v2
        with:
          credentials_json: ${{ secrets.GCP_SA_KEY }}

      - name: Build & Push
        run: |
          gcloud builds submit \
            --tag gcr.io/${{ secrets.GCP_PROJECT }}/ai-chatbot:${{ github.sha }}

      - name: Deploy
        run: |
          gcloud run deploy ai-chatbot \
            --image gcr.io/${{ secrets.GCP_PROJECT }}/ai-chatbot:${{ github.sha }} \
            --region asia-south1

Flow: Code push โ†’ Tests run โ†’ Docker build โ†’ Deploy โ†’ Live! ๐Ÿš€

Step 8: Monitoring Setup

๐Ÿ’ก Tip

Deploy pannadhum mudiyala โ€” monitoring must!

Track pannunga:

- ๐Ÿ“Š Response time โ€” API slow ah?

- โŒ Error rate โ€” 500 errors increase aagudha?

- ๐Ÿ“ˆ Request count โ€” traffic pattern enna?

- ๐Ÿ’พ Memory usage โ€” model memory leak irukka?

- ๐Ÿง  Model latency โ€” inference time evlo?

Tools:

- Google Cloud Monitoring (built-in)

- Prometheus + Grafana (self-hosted)

- Sentry (error tracking)

- Better Stack (uptime monitoring)

Alert setup: Error rate > 5% โ†’ Slack notification

Response time > 2s โ†’ Email alert ๐Ÿšจ

Model Versioning Strategy

AI apps la model updates frequent ah varum. Strategy:


Blue-Green Deployment ๐Ÿ”ต๐ŸŸข:

  • Old model (blue) running
  • New model (green) deploy pannunga
  • Test green, then traffic switch
  • Problem na blue ku rollback

Canary Deployment ๐Ÿค:

  • New model ku 5% traffic send pannunga
  • Monitor pannunga
  • Good na slowly 100% ku increase
  • Bad na rollback

bash
# Cloud Run traffic split
gcloud run services update-traffic ai-chatbot \
  --to-revisions ai-chatbot-v2=10,ai-chatbot-v1=90

Model storage:

  • Google Cloud Storage โ€” model files
  • MLflow Model Registry โ€” version tracking
  • DVC โ€” data version control

Environment Management

โš ๏ธ Warning

NEVER commit secrets to Git! ๐Ÿšซ

Do this:

code
# .env.example (commit this)
OPENAI_API_KEY=your-key-here
DATABASE_URL=postgresql://...
MODEL_PATH=models/model.onnx

# .env (DON'T commit โ€” add to .gitignore)
OPENAI_API_KEY=sk-abc123...

Secret management options:

- Google Secret Manager โ€” best for GCP

- AWS Secrets Manager โ€” best for AWS

- GitHub Secrets โ€” CI/CD ku

- Doppler โ€” multi-platform secret management

API key leak aana โ€” minutes la hackers use pannuvaanga. Monthly lakhs bill varum! ๐Ÿ˜ฑ

Performance Optimization

Production AI app fast ah run aaga tips:


1. Model Optimization ๐Ÿง 

  • ONNX Runtime use pannunga โ€” 2-3x faster
  • Quantization (FP16/INT8) โ€” model size half aagum
  • TensorRT โ€” NVIDIA GPU inference optimization

2. Caching ๐Ÿ’จ

  • Redis cache โ€” repeated queries cache pannunga
  • Response caching โ€” same input = same output

3. Async Processing โšก

  • FastAPI async endpoints use pannunga
  • Long tasks ku background workers (Celery)
  • Queue system (Redis Queue, RabbitMQ)

4. Connection Pooling ๐Ÿ”Œ

  • Database connections pool pannunga
  • HTTP client reuse (httpx)

Before vs After optimization:

MetricBeforeAfter
Response time2.5s0.3s
Throughput50 req/s500 req/s
Memory4GB1.5GB
Costโ‚น15000/moโ‚น5000/mo

Deploy Checklist

Production deploy ku munnaadi check pannunga:


Code โœ…

  • [ ] All tests passing
  • [ ] No hardcoded secrets
  • [ ] Error handling proper
  • [ ] Logging added
  • [ ] Health endpoint (/health)

Docker ๐Ÿณ

  • [ ] Multi-stage build
  • [ ] Non-root user
  • [ ] .dockerignore proper
  • [ ] Image size optimized

Cloud โ˜๏ธ

  • [ ] Resource limits set
  • [ ] Auto-scaling configured
  • [ ] SSL/HTTPS enabled
  • [ ] Custom domain mapped

Security ๐Ÿ”’

  • [ ] Secrets in Secret Manager
  • [ ] CORS configured
  • [ ] Rate limiting enabled
  • [ ] Input validation proper

Monitoring ๐Ÿ“Š

  • [ ] Logging setup
  • [ ] Alerts configured
  • [ ] Uptime monitoring
  • [ ] Error tracking (Sentry)

Follow this checklist โ€” production ready! ๐Ÿš€

โœ… Key Takeaways

โœ… App Structure Matters โ€” FastAPI (async, auto-docs, validation), proper project layout (app/, models/, tests/), requirements.txt versioning, .env.example documentation


โœ… Production Dockerfile โ€” Multi-stage build (50-70% size), slim base image, requirements cached layer, non-root user (security), health checks, proper signals handling


โœ… Local Testing Critical โ€” Docker Compose local stack. Curl test endpoints. Verify environment variables. Load test before cloud deploy


โœ… Cloud Deployment โ€” Cloud Run (easiest), serverless (auto-scale), HTTPS automatic, custom domain mapping. Environment variables Secret Manager, models GCS store


โœ… CI/CD Pipeline โ€” GitHub Actions: test โ†’ build โ†’ deploy automation. Every push tested, built, deployed. Rollback previous version instant. Staging first approach recommended


โœ… Model Versioning โ€” Blue-green deployment (old + new parallel, switch traffic). Canary deployment (5% โ†’ 100% gradual). Rollback instant if issues. MLflow or HuggingFace registry


โœ… Monitoring Essential โ€” Logging (Cloud Logging), metrics (Cloud Monitoring), alerts (error rate, latency, quota). Uptime monitoring, error tracking (Sentry). Proactive alerts not reactive patches


โœ… Performance Optimization โ€” ONNX format (2-3x faster), quantization (size small), Redis caching (repeated queries), async processing (FastAPI), connection pooling. 10x throughput improvement realistic

๐Ÿ ๐ŸŽฎ Mini Challenge

Challenge: Deploy Complete AI App to Cloud (End-to-End)


Beginner la hero โ€” full deployment pipeline pannu! ๐Ÿš€


Step 1: AI Model Prepare Pannunga ๐Ÿค–

bash
# Model download + optimize
from transformers import AutoModel, AutoTokenizer
import torch.onnx

model_name = "distilbert-base-uncased"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ONNX export (quantized โ€” smaller, faster)
# torch.onnx.export(...)

Step 2: FastAPI App Create ๐Ÿ

python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

@app.post("/predict")
async def predict(text: str):
    # Inference logic
    result = model.predict(text)
    return {"prediction": result}

@app.get("/health")
def health():
    return {"status": "ok"}

Step 3: Docker Container ๐Ÿณ

bash
# Dockerfile, build, test locally
docker build -t ai-app:latest .
docker run -p 8000:8000 ai-app:latest

Step 4: Cloud Deploy (Choose One) โ˜๏ธ

bash
# OPTION A: Google Cloud Run
gcloud run deploy ai-app --source . --platform managed

# OPTION B: AWS Lambda
# zip + upload function

# OPTION C: Azure App Service
# azure webapp deployment

Step 5: Custom Domain + SSL ๐Ÿ”’

bash
# Domain register (GoDaddy, Google Domains)
# Cloud provider โ†’ domain configure
# SSL certificate automatic (free)

Step 6: Monitoring + Alerts ๐Ÿ“Š

bash
# Logging setup (Cloud Logging)
# Metrics dashboard (Grafana)
# Alerts: high latency, errors, quota usage

Step 7: Promote & Share ๐ŸŽ‰

bash
# Public URL copy
# Friends, LinkedIn, Twitter share
# Real production app! ๐ŸŒŸ

Completion Time: 3-4 hours

Real Skill: End-to-end AI deployment

Career Impact: High โญโญโญ

๐Ÿ’ผ Interview Questions

Q1: Model versioning production la โ€” how manage?

A: Container tag use (v1.0, v2.0). Model registry (MLflow, Hugging Face). Blue-green deployment (old version, new version parallel run, switch). Rollback instant (previous tag deploy). Monitoring: model version track, performance metrics per version.


Q2: A/B testing production model โ€” setup?

A: Load balancer: 50% traffic model A, 50% model B. Metrics collect (accuracy, latency, user feedback). Statistical significance verify (winner decide). Canary: 5% โ†’ 25% โ†’ 100% traffic shift. Feature flags: no deployment, just toggle (instant rollback).


Q3: Cost optimization โ€” GPU inference reduce?

A: Quantization (8-bit, 4-bit). Distillation (smaller model). Caching (same input repeated). Batch inference (accumulate requests). Spot instances (preemptible). Serverless (pay-per-use, no idle cost). Model optimization: 10x cheaper possible.


Q4: Production AI app fail โ€” troubleshoot steps?

A: (1) Logs check (error message). (2) Metrics check (CPU, memory, latency). (3) Recent changes check (what changed?). (4) Rollback previous version (if recent deploy issue). (5) Health check (dependency โ€” database, API). (6) Disk space, quota check. (7) Model drift (data distribution change?).


Q5: Zero-downtime deployment โ€” how guarantee?

A: Rolling update (old pod stop before new start). Blue-green (old environment, new environment switch instant). Canary (gradual traffic shift). Load balancer: already healthy pods only traffic send. Health check: pod ready verify before traffic. Graceful shutdown: ongoing requests complete before stop.

Frequently Asked Questions

โ“ AI app deploy panna best platform edhu?
Beginners ku Google Cloud Run or Railway best. Medium scale ku AWS ECS or GKE. Large scale ku Kubernetes (EKS/GKE). GPU venum na AWS SageMaker or GCP Vertex AI.
โ“ Deploy panna evlo cost aagum?
Simple AI API: free to โ‚น2000/month. GPU inference: โ‚น5000-20000/month. Free tiers use pannina initial la cost zero.
โ“ Model file too large โ€” epdhi deploy pannradhu?
Model files ah cloud storage la (S3/GCS) store pannunga. App start aagum bodhu download pannum. Or model registry (MLflow, HuggingFace) use pannunga.
โ“ HTTPS setup epdhi pannradhu?
Cloud Run, Vercel maari platforms auto HTTPS kudukum. Custom setup ku Let's Encrypt free SSL certificate use pannunga with Nginx reverse proxy.
๐Ÿง Knowledge Check
Quiz 1 of 1

AI app deploy pannum bodhu model file handle panna best approach edhu?

0 of 1 answered