← Back|CLOUD-DEVOPS›Section 1/18

0 of 18 completed

Deploy AI app

Intermediate⏱ 15 min read📅 Updated: 2026-02-17

Introduction

AI model build pannita — congratulations! 🎉 But adha unga laptop la mattum run pannina yaarukku use? World ku deploy pannanum!

"It works on my machine" — idhu development la dhaan seri. Production la reliable, scalable, secure ah run aaganum.

Indha article la oru AI app ah zero to production deploy panna complete guide paapom — Docker build, cloud deploy, domain setup, monitoring varaikkum! 🚀

Deployment Overview

AI app deploy panna steps overview:

Step	What	Tools
1️⃣	App prepare	FastAPI/Flask, requirements.txt
2️⃣	Containerize	Docker, Dockerfile
3️⃣	Test locally	Docker Compose
4️⃣	Push image	Docker Hub / GCR / ECR
5️⃣	Deploy to cloud	Cloud Run / ECS / K8s
6️⃣	Domain setup	Custom domain, SSL
7️⃣	CI/CD pipeline	GitHub Actions
8️⃣	Monitoring	Logging, alerts

Oru oru step ah detailed ah paapom! 📋

Step 1: App Structure

Production-ready AI app structure:

code

ai-chatbot/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI app
│   ├── model.py         # AI model loading
│   ├── schemas.py       # Request/Response models
│   └── config.py        # Settings
├── models/
│   └── model.onnx       # AI model file
├── tests/
│   ├── test_api.py
│   └── test_model.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example
├── .dockerignore
└── README.md

Key points:

FastAPI > Flask for production (async, auto-docs, validation)
ONNX format — faster inference than raw PyTorch/TF
.env.example — document required environment variables
tests/ — deploy pannradhu ku munnaadi test pannunga!

Step 2: Production Dockerfile

Optimized Dockerfile for AI apps:

dockerfile

# Multi-stage build — smaller final image
FROM python:3.11-slim as builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

FROM python:3.11-slim

WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .

# Non-root user (security)
RUN useradd -m appuser
USER appuser

ENV PATH=/root/.local/bin:$PATH
ENV PORT=8000

EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=5s \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Optimization tips:

Multi-stage build — final image size 50-70% reduce aagum
--no-cache-dir — pip cache remove, image smaller
Non-root user — security best practice
.dockerignore — unnecessary files exclude pannunga

Step 3: Local Testing

✅ Example

Deploy pannradhu ku munnaadi local la test pannunga!

bash

# Build
docker build -t ai-chatbot:v1 .

# Run
docker run -p 8000:8000 --env-file .env ai-chatbot:v1

# Test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello AI!"}'

Docker Compose for full stack:

yaml

services:
  api:
    build: .
    ports: ["8000:8000"]
    env_file: .env
  redis:
    image: redis:alpine
    ports: ["6379:6379"]

Local la work aana dhaan cloud la deploy pannunga! ✅

Step 4-5: Cloud Deployment

Google Cloud Run — easiest option for AI apps:

bash

# 1. Google Cloud CLI install & login
gcloud auth login
gcloud config set project my-ai-project

# 2. Container Registry ku push
gcloud builds submit --tag gcr.io/my-ai-project/ai-chatbot:v1

# 3. Deploy to Cloud Run
gcloud run deploy ai-chatbot \
  --image gcr.io/my-ai-project/ai-chatbot:v1 \
  --platform managed \
  --region asia-south1 \
  --memory 2Gi \
  --cpu 2 \
  --min-instances 0 \
  --max-instances 10 \
  --set-env-vars "MODEL_PATH=models/model.onnx"

# 4. URL kedaikum!
# https://ai-chatbot-xyz-as.a.run.app

Cloud Run advantages:

Scale to zero — no traffic = no cost 💰
Auto-scaling — traffic based
HTTPS automatic
Pay per request

Production AI App Architecture

🏗️ Architecture Diagram

┌──────────────────────────────────────────────────┐
│        PRODUCTION AI APP ARCHITECTURE             │
├──────────────────────────────────────────────────┤
│                                                    │
│  👤 Users                                          │
│    │                                               │
│    ▼                                               │
│  ┌──────────────┐     ┌───────────────┐           │
│  │ CloudFlare   │────▶│ Cloud Run /   │           │
│  │ CDN + DNS    │     │ Load Balancer │           │
│  └──────────────┘     └───────┬───────┘           │
│                               │                    │
│                    ┌──────────┼──────────┐         │
│                    ▼          ▼          ▼         │
│              ┌────────┐ ┌────────┐ ┌────────┐     │
│              │ API    │ │ API    │ │ API    │     │
│              │ Pod 1  │ │ Pod 2  │ │ Pod 3  │     │
│              └───┬────┘ └───┬────┘ └───┬────┘     │
│                  │          │          │           │
│              ┌───┴──────────┴──────────┴───┐      │
│              │        Shared Services       │      │
│              ├──────┬──────────┬────────────┤      │
│              │Redis │ Cloud   │ Cloud      │      │
│              │Cache │ Storage │ SQL/DB     │      │
│              └──────┴──────────┴────────────┘      │
│                                                    │
│  📊 Monitoring: Cloud Monitoring + Logging         │
└──────────────────────────────────────────────────┘

Step 6: Custom Domain & SSL

Professional URL setup:

Cloud Run custom domain:

bash

# Map domain
gcloud run domain-mappings create \
  --service ai-chatbot \
  --domain api.myaiapp.com \
  --region asia-south1

DNS setup (domain provider la):

Type	Name	Value
CNAME	api	ghs.googlehosted.com
A	@	Cloud Run IP

SSL — Cloud Run automatic HTTPS kudukum! Free! 🔒

Cloudflare recommend pannrom:

Free SSL
DDoS protection
CDN caching
Analytics
Nambha AI app ku extra security layer! 🛡️

Step 7: CI/CD with GitHub Actions

Every push la auto-deploy — GitHub Actions:

yaml

# .github/workflows/deploy.yml
name: Deploy AI App

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Tests
        run: |
          pip install -r requirements.txt
          pytest tests/

      - name: Auth to GCP
        uses: google-github-actions/auth@v2
        with:
          credentials_json: ${{ secrets.GCP_SA_KEY }}

      - name: Build & Push
        run: |
          gcloud builds submit \
            --tag gcr.io/${{ secrets.GCP_PROJECT }}/ai-chatbot:${{ github.sha }}

      - name: Deploy
        run: |
          gcloud run deploy ai-chatbot \
            --image gcr.io/${{ secrets.GCP_PROJECT }}/ai-chatbot:${{ github.sha }} \
            --region asia-south1

Flow: Code push → Tests run → Docker build → Deploy → Live! 🚀

Step 8: Monitoring Setup

💡 Tip

Deploy pannadhum mudiyala — monitoring must!

Track pannunga:

- 📊 Response time — API slow ah?

- ❌ Error rate — 500 errors increase aagudha?

- 📈 Request count — traffic pattern enna?

- 💾 Memory usage — model memory leak irukka?

- 🧠 Model latency — inference time evlo?

Tools:

- Google Cloud Monitoring (built-in)

- Prometheus + Grafana (self-hosted)

- Sentry (error tracking)

- Better Stack (uptime monitoring)

Alert setup: Error rate > 5% → Slack notification

Response time > 2s → Email alert 🚨

Model Versioning Strategy

AI apps la model updates frequent ah varum. Strategy:

Blue-Green Deployment 🔵🟢:

Old model (blue) running
New model (green) deploy pannunga
Test green, then traffic switch
Problem na blue ku rollback

Canary Deployment 🐤:

New model ku 5% traffic send pannunga
Monitor pannunga
Good na slowly 100% ku increase
Bad na rollback

bash

# Cloud Run traffic split
gcloud run services update-traffic ai-chatbot \
  --to-revisions ai-chatbot-v2=10,ai-chatbot-v1=90

Model storage:

Google Cloud Storage — model files
MLflow Model Registry — version tracking
DVC — data version control

Environment Management

⚠️ Warning

NEVER commit secrets to Git! 🚫

Do this:

code

# .env.example (commit this)
OPENAI_API_KEY=your-key-here
DATABASE_URL=postgresql://...
MODEL_PATH=models/model.onnx

# .env (DON'T commit — add to .gitignore)
OPENAI_API_KEY=sk-abc123...

Secret management options:

- Google Secret Manager — best for GCP

- AWS Secrets Manager — best for AWS

- GitHub Secrets — CI/CD ku

- Doppler — multi-platform secret management

API key leak aana — minutes la hackers use pannuvaanga. Monthly lakhs bill varum! 😱

Performance Optimization

Production AI app fast ah run aaga tips:

1. Model Optimization 🧠

ONNX Runtime use pannunga — 2-3x faster
Quantization (FP16/INT8) — model size half aagum
TensorRT — NVIDIA GPU inference optimization

2. Caching 💨

Redis cache — repeated queries cache pannunga
Response caching — same input = same output

3. Async Processing ⚡

FastAPI async endpoints use pannunga
Long tasks ku background workers (Celery)
Queue system (Redis Queue, RabbitMQ)

4. Connection Pooling 🔌

Database connections pool pannunga
HTTP client reuse (httpx)

Before vs After optimization:

Metric	Before	After
Response time	2.5s	0.3s
Throughput	50 req/s	500 req/s
Memory	4GB	1.5GB
Cost	₹15000/mo	₹5000/mo

Deploy Checklist

Production deploy ku munnaadi check pannunga:

Code ✅

[ ] All tests passing
[ ] No hardcoded secrets
[ ] Error handling proper
[ ] Logging added
[ ] Health endpoint (/health)

Docker 🐳

[ ] Multi-stage build
[ ] Non-root user
[ ] .dockerignore proper
[ ] Image size optimized

Cloud ☁️

[ ] Resource limits set
[ ] Auto-scaling configured
[ ] SSL/HTTPS enabled
[ ] Custom domain mapped

Security 🔒

[ ] Secrets in Secret Manager
[ ] CORS configured
[ ] Rate limiting enabled
[ ] Input validation proper

Monitoring 📊

[ ] Logging setup
[ ] Alerts configured
[ ] Uptime monitoring
[ ] Error tracking (Sentry)

Follow this checklist — production ready! 🚀

✅ Key Takeaways

✅ App Structure Matters — FastAPI (async, auto-docs, validation), proper project layout (app/, models/, tests/), requirements.txt versioning, .env.example documentation

✅ Production Dockerfile — Multi-stage build (50-70% size), slim base image, requirements cached layer, non-root user (security), health checks, proper signals handling

✅ Local Testing Critical — Docker Compose local stack. Curl test endpoints. Verify environment variables. Load test before cloud deploy

✅ Cloud Deployment — Cloud Run (easiest), serverless (auto-scale), HTTPS automatic, custom domain mapping. Environment variables Secret Manager, models GCS store

✅ CI/CD Pipeline — GitHub Actions: test → build → deploy automation. Every push tested, built, deployed. Rollback previous version instant. Staging first approach recommended

✅ Model Versioning — Blue-green deployment (old + new parallel, switch traffic). Canary deployment (5% → 100% gradual). Rollback instant if issues. MLflow or HuggingFace registry

✅ Monitoring Essential — Logging (Cloud Logging), metrics (Cloud Monitoring), alerts (error rate, latency, quota). Uptime monitoring, error tracking (Sentry). Proactive alerts not reactive patches

✅ Performance Optimization — ONNX format (2-3x faster), quantization (size small), Redis caching (repeated queries), async processing (FastAPI), connection pooling. 10x throughput improvement realistic

🏁 🎮 Mini Challenge

Challenge: Deploy Complete AI App to Cloud (End-to-End)

Beginner la hero — full deployment pipeline pannu! 🚀

Step 1: AI Model Prepare Pannunga 🤖

bash

# Model download + optimize
from transformers import AutoModel, AutoTokenizer
import torch.onnx

model_name = "distilbert-base-uncased"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ONNX export (quantized — smaller, faster)
# torch.onnx.export(...)

Step 2: FastAPI App Create 🐍

python

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

@app.post("/predict")
async def predict(text: str):
    # Inference logic
    result = model.predict(text)
    return {"prediction": result}

@app.get("/health")
def health():
    return {"status": "ok"}

Step 3: Docker Container 🐳

bash

# Dockerfile, build, test locally
docker build -t ai-app:latest .
docker run -p 8000:8000 ai-app:latest

Step 4: Cloud Deploy (Choose One) ☁️

bash

# OPTION A: Google Cloud Run
gcloud run deploy ai-app --source . --platform managed

# OPTION B: AWS Lambda
# zip + upload function

# OPTION C: Azure App Service
# azure webapp deployment

Step 5: Custom Domain + SSL 🔒

bash

# Domain register (GoDaddy, Google Domains)
# Cloud provider → domain configure
# SSL certificate automatic (free)

Step 6: Monitoring + Alerts 📊

bash

# Logging setup (Cloud Logging)
# Metrics dashboard (Grafana)
# Alerts: high latency, errors, quota usage

Step 7: Promote & Share 🎉

bash

# Public URL copy
# Friends, LinkedIn, Twitter share
# Real production app! 🌟

Completion Time: 3-4 hours

Real Skill: End-to-end AI deployment

Career Impact: High ⭐⭐⭐

💼 Interview Questions

Q1: Model versioning production la — how manage?

A: Container tag use (v1.0, v2.0). Model registry (MLflow, Hugging Face). Blue-green deployment (old version, new version parallel run, switch). Rollback instant (previous tag deploy). Monitoring: model version track, performance metrics per version.

Q2: A/B testing production model — setup?

A: Load balancer: 50% traffic model A, 50% model B. Metrics collect (accuracy, latency, user feedback). Statistical significance verify (winner decide). Canary: 5% → 25% → 100% traffic shift. Feature flags: no deployment, just toggle (instant rollback).

Q3: Cost optimization — GPU inference reduce?

A: Quantization (8-bit, 4-bit). Distillation (smaller model). Caching (same input repeated). Batch inference (accumulate requests). Spot instances (preemptible). Serverless (pay-per-use, no idle cost). Model optimization: 10x cheaper possible.

Q4: Production AI app fail — troubleshoot steps?

A: (1) Logs check (error message). (2) Metrics check (CPU, memory, latency). (3) Recent changes check (what changed?). (4) Rollback previous version (if recent deploy issue). (5) Health check (dependency — database, API). (6) Disk space, quota check. (7) Model drift (data distribution change?).

Q5: Zero-downtime deployment — how guarantee?

A: Rolling update (old pod stop before new start). Blue-green (old environment, new environment switch instant). Canary (gradual traffic shift). Load balancer: already healthy pods only traffic send. Health check: pod ready verify before traffic. Graceful shutdown: ongoing requests complete before stop.

Frequently Asked Questions

❓ AI app deploy panna best platform edhu?

Beginners ku Google Cloud Run or Railway best. Medium scale ku AWS ECS or GKE. Large scale ku Kubernetes (EKS/GKE). GPU venum na AWS SageMaker or GCP Vertex AI.

❓ Deploy panna evlo cost aagum?

Simple AI API: free to ₹2000/month. GPU inference: ₹5000-20000/month. Free tiers use pannina initial la cost zero.

❓ Model file too large — epdhi deploy pannradhu?

Model files ah cloud storage la (S3/GCS) store pannunga. App start aagum bodhu download pannum. Or model registry (MLflow, HuggingFace) use pannunga.

❓ HTTPS setup epdhi pannradhu?

Cloud Run, Vercel maari platforms auto HTTPS kudukum. Custom setup ku Let's Encrypt free SSL certificate use pannunga with Nginx reverse proxy.

🧠Knowledge Check

Quiz 1 of 1

AI app deploy pannum bodhu model file handle panna best approach edhu?

0 of 1 answered

← Previous ByteKubernetes basics Next Byte →Infrastructure as Code

Courses

Learning Paths

Exam Prep

Deploy AI app

Introduction

Deployment Overview

Step 1: App Structure

Step 2: Production Dockerfile

Step 3: Local Testing

Step 4-5: Cloud Deployment

Production AI App Architecture

Step 6: Custom Domain & SSL

Step 7: CI/CD with GitHub Actions

Step 8: Monitoring Setup

Model Versioning Strategy

Environment Management

Performance Optimization

Deploy Checklist

✅ Key Takeaways

🏁 🎮 Mini Challenge

💼 Interview Questions

Frequently Asked Questions