← Back|CLOUD-DEVOPSSection 1/18
0 of 18 completed

Deploy AI app

Intermediate15 min read📅 Updated: 2026-02-17

Introduction

AI model build pannita — congratulations! 🎉 But adha unga laptop la mattum run pannina yaarukku use? World ku deploy pannanum!


"It works on my machine" — idhu development la dhaan seri. Production la reliable, scalable, secure ah run aaganum.


Indha article la oru AI app ah zero to production deploy panna complete guide paapom — Docker build, cloud deploy, domain setup, monitoring varaikkum! 🚀

Deployment Overview

AI app deploy panna steps overview:


StepWhatTools
1️⃣App prepareFastAPI/Flask, requirements.txt
2️⃣ContainerizeDocker, Dockerfile
3️⃣Test locallyDocker Compose
4️⃣Push imageDocker Hub / GCR / ECR
5️⃣Deploy to cloudCloud Run / ECS / K8s
6️⃣Domain setupCustom domain, SSL
7️⃣CI/CD pipelineGitHub Actions
8️⃣MonitoringLogging, alerts

Oru oru step ah detailed ah paapom! 📋

Step 1: App Structure

Production-ready AI app structure:


code
ai-chatbot/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI app
│   ├── model.py         # AI model loading
│   ├── schemas.py       # Request/Response models
│   └── config.py        # Settings
├── models/
│   └── model.onnx       # AI model file
├── tests/
│   ├── test_api.py
│   └── test_model.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example
├── .dockerignore
└── README.md

Key points:

  • FastAPI > Flask for production (async, auto-docs, validation)
  • ONNX format — faster inference than raw PyTorch/TF
  • .env.example — document required environment variables
  • tests/ — deploy pannradhu ku munnaadi test pannunga!

Step 2: Production Dockerfile

Optimized Dockerfile for AI apps:


dockerfile
# Multi-stage build — smaller final image
FROM python:3.11-slim as builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

FROM python:3.11-slim

WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .

# Non-root user (security)
RUN useradd -m appuser
USER appuser

ENV PATH=/root/.local/bin:$PATH
ENV PORT=8000

EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=5s \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Optimization tips:

  • Multi-stage build — final image size 50-70% reduce aagum
  • --no-cache-dir — pip cache remove, image smaller
  • Non-root user — security best practice
  • .dockerignore — unnecessary files exclude pannunga

Step 3: Local Testing

Example

Deploy pannradhu ku munnaadi local la test pannunga!

bash
# Build
docker build -t ai-chatbot:v1 .

# Run
docker run -p 8000:8000 --env-file .env ai-chatbot:v1

# Test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello AI!"}'

Docker Compose for full stack:

yaml
services:
  api:
    build: .
    ports: ["8000:8000"]
    env_file: .env
  redis:
    image: redis:alpine
    ports: ["6379:6379"]

Local la work aana dhaan cloud la deploy pannunga! ✅

Step 4-5: Cloud Deployment

Google Cloud Run — easiest option for AI apps:


bash
# 1. Google Cloud CLI install & login
gcloud auth login
gcloud config set project my-ai-project

# 2. Container Registry ku push
gcloud builds submit --tag gcr.io/my-ai-project/ai-chatbot:v1

# 3. Deploy to Cloud Run
gcloud run deploy ai-chatbot \
  --image gcr.io/my-ai-project/ai-chatbot:v1 \
  --platform managed \
  --region asia-south1 \
  --memory 2Gi \
  --cpu 2 \
  --min-instances 0 \
  --max-instances 10 \
  --set-env-vars "MODEL_PATH=models/model.onnx"

# 4. URL kedaikum!
# https://ai-chatbot-xyz-as.a.run.app

Cloud Run advantages:

  • Scale to zero — no traffic = no cost 💰
  • Auto-scaling — traffic based
  • HTTPS automatic
  • Pay per request

Production AI App Architecture

🏗️ Architecture Diagram
┌──────────────────────────────────────────────────┐
│        PRODUCTION AI APP ARCHITECTURE             │
├──────────────────────────────────────────────────┤
│                                                    │
│  👤 Users                                          │
│    │                                               │
│    ▼                                               │
│  ┌──────────────┐     ┌───────────────┐           │
│  │ CloudFlare   │────▶│ Cloud Run /   │           │
│  │ CDN + DNS    │     │ Load Balancer │           │
│  └──────────────┘     └───────┬───────┘           │
│                               │                    │
│                    ┌──────────┼──────────┐         │
│                    ▼          ▼          ▼         │
│              ┌────────┐ ┌────────┐ ┌────────┐     │
│              │ API    │ │ API    │ │ API    │     │
│              │ Pod 1  │ │ Pod 2  │ │ Pod 3  │     │
│              └───┬────┘ └───┬────┘ └───┬────┘     │
│                  │          │          │           │
│              ┌───┴──────────┴──────────┴───┐      │
│              │        Shared Services       │      │
│              ├──────┬──────────┬────────────┤      │
│              │Redis │ Cloud   │ Cloud      │      │
│              │Cache │ Storage │ SQL/DB     │      │
│              └──────┴──────────┴────────────┘      │
│                                                    │
│  📊 Monitoring: Cloud Monitoring + Logging         │
└──────────────────────────────────────────────────┘

Step 6: Custom Domain & SSL

Professional URL setup:


Cloud Run custom domain:

bash
# Map domain
gcloud run domain-mappings create \
  --service ai-chatbot \
  --domain api.myaiapp.com \
  --region asia-south1

DNS setup (domain provider la):

TypeNameValue
CNAMEapighs.googlehosted.com
A@Cloud Run IP

SSL — Cloud Run automatic HTTPS kudukum! Free! 🔒


Cloudflare recommend pannrom:

  • Free SSL
  • DDoS protection
  • CDN caching
  • Analytics
  • Nambha AI app ku extra security layer! 🛡️

Step 7: CI/CD with GitHub Actions

Every push la auto-deploy — GitHub Actions:


yaml
# .github/workflows/deploy.yml
name: Deploy AI App

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Tests
        run: |
          pip install -r requirements.txt
          pytest tests/

      - name: Auth to GCP
        uses: google-github-actions/auth@v2
        with:
          credentials_json: ${{ secrets.GCP_SA_KEY }}

      - name: Build & Push
        run: |
          gcloud builds submit \
            --tag gcr.io/${{ secrets.GCP_PROJECT }}/ai-chatbot:${{ github.sha }}

      - name: Deploy
        run: |
          gcloud run deploy ai-chatbot \
            --image gcr.io/${{ secrets.GCP_PROJECT }}/ai-chatbot:${{ github.sha }} \
            --region asia-south1

Flow: Code push → Tests run → Docker build → Deploy → Live! 🚀

Step 8: Monitoring Setup

💡 Tip

Deploy pannadhum mudiyala — monitoring must!

Track pannunga:

- 📊 Response time — API slow ah?

- ❌ Error rate — 500 errors increase aagudha?

- 📈 Request count — traffic pattern enna?

- 💾 Memory usage — model memory leak irukka?

- 🧠 Model latency — inference time evlo?

Tools:

- Google Cloud Monitoring (built-in)

- Prometheus + Grafana (self-hosted)

- Sentry (error tracking)

- Better Stack (uptime monitoring)

Alert setup: Error rate > 5% → Slack notification

Response time > 2s → Email alert 🚨

Model Versioning Strategy

AI apps la model updates frequent ah varum. Strategy:


Blue-Green Deployment 🔵🟢:

  • Old model (blue) running
  • New model (green) deploy pannunga
  • Test green, then traffic switch
  • Problem na blue ku rollback

Canary Deployment 🐤:

  • New model ku 5% traffic send pannunga
  • Monitor pannunga
  • Good na slowly 100% ku increase
  • Bad na rollback

bash
# Cloud Run traffic split
gcloud run services update-traffic ai-chatbot \
  --to-revisions ai-chatbot-v2=10,ai-chatbot-v1=90

Model storage:

  • Google Cloud Storage — model files
  • MLflow Model Registry — version tracking
  • DVC — data version control

Environment Management

⚠️ Warning

NEVER commit secrets to Git! 🚫

Do this:

code
# .env.example (commit this)
OPENAI_API_KEY=your-key-here
DATABASE_URL=postgresql://...
MODEL_PATH=models/model.onnx

# .env (DON'T commit — add to .gitignore)
OPENAI_API_KEY=sk-abc123...

Secret management options:

- Google Secret Manager — best for GCP

- AWS Secrets Manager — best for AWS

- GitHub Secrets — CI/CD ku

- Doppler — multi-platform secret management

API key leak aana — minutes la hackers use pannuvaanga. Monthly lakhs bill varum! 😱

Performance Optimization

Production AI app fast ah run aaga tips:


1. Model Optimization 🧠

  • ONNX Runtime use pannunga — 2-3x faster
  • Quantization (FP16/INT8) — model size half aagum
  • TensorRT — NVIDIA GPU inference optimization

2. Caching 💨

  • Redis cache — repeated queries cache pannunga
  • Response caching — same input = same output

3. Async Processing

  • FastAPI async endpoints use pannunga
  • Long tasks ku background workers (Celery)
  • Queue system (Redis Queue, RabbitMQ)

4. Connection Pooling 🔌

  • Database connections pool pannunga
  • HTTP client reuse (httpx)

Before vs After optimization:

MetricBeforeAfter
Response time2.5s0.3s
Throughput50 req/s500 req/s
Memory4GB1.5GB
Cost₹15000/mo₹5000/mo

Deploy Checklist

Production deploy ku munnaadi check pannunga:


Code

  • [ ] All tests passing
  • [ ] No hardcoded secrets
  • [ ] Error handling proper
  • [ ] Logging added
  • [ ] Health endpoint (/health)

Docker 🐳

  • [ ] Multi-stage build
  • [ ] Non-root user
  • [ ] .dockerignore proper
  • [ ] Image size optimized

Cloud ☁️

  • [ ] Resource limits set
  • [ ] Auto-scaling configured
  • [ ] SSL/HTTPS enabled
  • [ ] Custom domain mapped

Security 🔒

  • [ ] Secrets in Secret Manager
  • [ ] CORS configured
  • [ ] Rate limiting enabled
  • [ ] Input validation proper

Monitoring 📊

  • [ ] Logging setup
  • [ ] Alerts configured
  • [ ] Uptime monitoring
  • [ ] Error tracking (Sentry)

Follow this checklist — production ready! 🚀

Key Takeaways

App Structure Matters — FastAPI (async, auto-docs, validation), proper project layout (app/, models/, tests/), requirements.txt versioning, .env.example documentation


Production Dockerfile — Multi-stage build (50-70% size), slim base image, requirements cached layer, non-root user (security), health checks, proper signals handling


Local Testing Critical — Docker Compose local stack. Curl test endpoints. Verify environment variables. Load test before cloud deploy


Cloud Deployment — Cloud Run (easiest), serverless (auto-scale), HTTPS automatic, custom domain mapping. Environment variables Secret Manager, models GCS store


CI/CD Pipeline — GitHub Actions: test → build → deploy automation. Every push tested, built, deployed. Rollback previous version instant. Staging first approach recommended


Model Versioning — Blue-green deployment (old + new parallel, switch traffic). Canary deployment (5% → 100% gradual). Rollback instant if issues. MLflow or HuggingFace registry


Monitoring Essential — Logging (Cloud Logging), metrics (Cloud Monitoring), alerts (error rate, latency, quota). Uptime monitoring, error tracking (Sentry). Proactive alerts not reactive patches


Performance Optimization — ONNX format (2-3x faster), quantization (size small), Redis caching (repeated queries), async processing (FastAPI), connection pooling. 10x throughput improvement realistic

🏁 🎮 Mini Challenge

Challenge: Deploy Complete AI App to Cloud (End-to-End)


Beginner la hero — full deployment pipeline pannu! 🚀


Step 1: AI Model Prepare Pannunga 🤖

bash
# Model download + optimize
from transformers import AutoModel, AutoTokenizer
import torch.onnx

model_name = "distilbert-base-uncased"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ONNX export (quantized — smaller, faster)
# torch.onnx.export(...)

Step 2: FastAPI App Create 🐍

python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

@app.post("/predict")
async def predict(text: str):
    # Inference logic
    result = model.predict(text)
    return {"prediction": result}

@app.get("/health")
def health():
    return {"status": "ok"}

Step 3: Docker Container 🐳

bash
# Dockerfile, build, test locally
docker build -t ai-app:latest .
docker run -p 8000:8000 ai-app:latest

Step 4: Cloud Deploy (Choose One) ☁️

bash
# OPTION A: Google Cloud Run
gcloud run deploy ai-app --source . --platform managed

# OPTION B: AWS Lambda
# zip + upload function

# OPTION C: Azure App Service
# azure webapp deployment

Step 5: Custom Domain + SSL 🔒

bash
# Domain register (GoDaddy, Google Domains)
# Cloud provider → domain configure
# SSL certificate automatic (free)

Step 6: Monitoring + Alerts 📊

bash
# Logging setup (Cloud Logging)
# Metrics dashboard (Grafana)
# Alerts: high latency, errors, quota usage

Step 7: Promote & Share 🎉

bash
# Public URL copy
# Friends, LinkedIn, Twitter share
# Real production app! 🌟

Completion Time: 3-4 hours

Real Skill: End-to-end AI deployment

Career Impact: High ⭐⭐⭐

💼 Interview Questions

Q1: Model versioning production la — how manage?

A: Container tag use (v1.0, v2.0). Model registry (MLflow, Hugging Face). Blue-green deployment (old version, new version parallel run, switch). Rollback instant (previous tag deploy). Monitoring: model version track, performance metrics per version.


Q2: A/B testing production model — setup?

A: Load balancer: 50% traffic model A, 50% model B. Metrics collect (accuracy, latency, user feedback). Statistical significance verify (winner decide). Canary: 5% → 25% → 100% traffic shift. Feature flags: no deployment, just toggle (instant rollback).


Q3: Cost optimization — GPU inference reduce?

A: Quantization (8-bit, 4-bit). Distillation (smaller model). Caching (same input repeated). Batch inference (accumulate requests). Spot instances (preemptible). Serverless (pay-per-use, no idle cost). Model optimization: 10x cheaper possible.


Q4: Production AI app fail — troubleshoot steps?

A: (1) Logs check (error message). (2) Metrics check (CPU, memory, latency). (3) Recent changes check (what changed?). (4) Rollback previous version (if recent deploy issue). (5) Health check (dependency — database, API). (6) Disk space, quota check. (7) Model drift (data distribution change?).


Q5: Zero-downtime deployment — how guarantee?

A: Rolling update (old pod stop before new start). Blue-green (old environment, new environment switch instant). Canary (gradual traffic shift). Load balancer: already healthy pods only traffic send. Health check: pod ready verify before traffic. Graceful shutdown: ongoing requests complete before stop.

Frequently Asked Questions

AI app deploy panna best platform edhu?
Beginners ku Google Cloud Run or Railway best. Medium scale ku AWS ECS or GKE. Large scale ku Kubernetes (EKS/GKE). GPU venum na AWS SageMaker or GCP Vertex AI.
Deploy panna evlo cost aagum?
Simple AI API: free to ₹2000/month. GPU inference: ₹5000-20000/month. Free tiers use pannina initial la cost zero.
Model file too large — epdhi deploy pannradhu?
Model files ah cloud storage la (S3/GCS) store pannunga. App start aagum bodhu download pannum. Or model registry (MLflow, HuggingFace) use pannunga.
HTTPS setup epdhi pannradhu?
Cloud Run, Vercel maari platforms auto HTTPS kudukum. Custom setup ku Let's Encrypt free SSL certificate use pannunga with Nginx reverse proxy.
🧠Knowledge Check
Quiz 1 of 1

AI app deploy pannum bodhu model file handle panna best approach edhu?

0 of 1 answered