← Back|CLOUD-DEVOPS›Section 1/16

0 of 16 completed

Monitoring AI apps

Intermediate⏱ 15 min read📅 Updated: 2026-02-17

Introduction

Unga AI model production la deploy aachchu. Users use panraanga. Everything looks fine... but is it? 🤔

Scary truth: AI models silently degrade. Traditional apps crash pannina error throw pannum. But AI models — wrong predictions kudukum, no error, no crash. Users bad experience get panraanga, nee theriyaadhey iruppa!

Real example: Zillow's AI home pricing model silently drifted — company $500 million loss panniduthu! 😱

Monitoring = Unga AI app oda eyes and ears. Indha article la AI-specific monitoring, Prometheus + Grafana setup, model drift detection — ellam hands-on ah paapom! 📊

Three Pillars of Observability

Observability = System internal state understand pannradhu external outputs la irundhu.

1. Metrics 📊 — Numbers over time

CPU usage: 75%
Request latency: 120ms
Model accuracy: 94.2%
Predictions per second: 500

2. Logs 📝 — Event records

code

[2026-02-17 10:30:15] INFO: Prediction request - input_size=512, model=v2.3
[2026-02-17 10:30:15] INFO: Inference completed - latency=45ms, confidence=0.92
[2026-02-17 10:30:16] WARN: Low confidence prediction - confidence=0.34

3. Traces 🔍 — Request journey tracking

code

Request → API Gateway (5ms) → Pre-process (20ms) → 
Model Inference (45ms) → Post-process (10ms) → Response (80ms total)

Pillar	Question It Answers	Tool
Metrics	"How much? How fast?"	Prometheus
Logs	"What happened?"	ELK Stack, Loki
Traces	"Where did time go?"	Jaeger, Zipkin

All three venum! Metrics alert kudukum, logs root cause kaatdum, traces bottleneck identify pannum. 🎯

AI-Specific Metrics — What to Track

Normal infra metrics PLUS these AI-specific ones:

🎯 Model Performance Metrics:

Metric	What	Alert When
Accuracy	Prediction correctness	Drops >5%
Latency (p50/p95/p99)	Inference time	p99 > 2s
Throughput	Predictions/second	Drops >20%
Error rate	Failed predictions	> 1%
Confidence scores	Model certainty	Avg drops below 0.7

📊 Data Quality Metrics:

Metric	What	Alert When
Feature drift	Input distribution change	Significant shift
Missing values	Null/NaN in inputs	> 5%
Data volume	Requests per hour	Unusual spike/drop
Schema violations	Unexpected input format	Any occurrence

🔄 Model Drift Metrics:

Metric	What	Alert When
PSI (Population Stability Index)	Distribution shift	PSI > 0.2
KL Divergence	Statistical distance	Significant increase
Prediction distribution	Output pattern change	Unexpected shift

Pro tip: Dashboard la real-time accuracy kaattunga — most important metric for AI apps! 📈

Prometheus + Grafana Setup

✅ Example

Step 1: Python app la metrics expose pannunga

python

from prometheus_client import Counter, Histogram, Gauge
import time

# Define metrics
PREDICTIONS = Counter(
    'model_predictions_total',
    'Total predictions',
    ['model_version', 'status']
)

LATENCY = Histogram(
    'model_inference_seconds',
    'Inference latency',
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.0]
)

ACCURACY = Gauge(
    'model_accuracy',
    'Current model accuracy'
)

CONFIDENCE = Histogram(
    'prediction_confidence',
    'Prediction confidence scores',
    buckets=[0.1, 0.2, 0.3, 0.5, 0.7, 0.8, 0.9, 0.95]
)

# Use in your prediction endpoint
@app.post("/predict")
async def predict(request: PredictRequest):
    start = time.time()
    result = model.predict(request.data)
    duration = time.time() - start

    LATENCY.observe(duration)
    PREDICTIONS.labels(
        model_version="v2.3", status="success"
    ).inc()
    CONFIDENCE.observe(result.confidence)

    return result

Step 2: Prometheus scrape config add pannunga, Grafana dashboard create pannunga! 📊

Model Drift — The Silent Killer

Model drift = AI apps oda #1 enemy! Training time la 95% accuracy, 3 months la 70% ku drop aagum — no errors, no crashes, just bad predictions. 😰

Types of drift:

1. Data Drift (Input distribution changes)

Training: Mostly English text
Production: Suddenly Tamil + Hindi mix text varudhu
Model confused aagum!

2. Concept Drift (Relationship changes)

Training: "Work from home" = negative sentiment (pre-COVID)
Production: "Work from home" = positive sentiment (post-COVID)
Same input, different meaning!

3. Prediction Drift (Output distribution changes)

Training: 50% positive, 50% negative predictions
Production: 90% positive — something wrong!

Detection code:

python

from evidently.metrics import DataDriftPreset
from evidently.report import Report

# Compare training vs production data
report = Report(metrics=[DataDriftPreset()])
report.run(
    reference_data=training_df,
    current_data=production_df
)

# Get drift score
drift_results = report.as_dict()
if drift_results['drift_detected']:
    alert("⚠️ Data drift detected! Retrain needed!")

Rule: Weekly drift check mandatory for production AI! 🔄

AI Monitoring Dashboard Design

Effective AI monitoring dashboard:

Row 1 — Health Overview 🟢

Model version (current)
Uptime percentage
Total predictions today
Current error rate

Row 2 — Performance Metrics 📈

Accuracy trend (last 7 days)
Latency p50/p95/p99 chart
Throughput (requests/sec)
Confidence score distribution

Row 3 — Data Quality 📊

Feature drift indicators
Missing value percentage
Input volume trend
Schema violation count

Row 4 — Infrastructure 🖥️

CPU/Memory usage
GPU utilization (for inference)
Disk space
Network I/O

Grafana dashboard JSON:

json

{
  "panels": [
    {
      "title": "Model Accuracy (7d)",
      "type": "timeseries",
      "targets": [{
        "expr": "model_accuracy"
      }]
    },
    {
      "title": "Inference Latency",
      "type": "histogram",
      "targets": [{
        "expr": "histogram_quantile(0.95, model_inference_seconds_bucket)"
      }]
    }
  ]
}

Pro tip: RED method follow pannunga — Rate, Errors, Duration. Every service ku ivanga track pannunga! 🎯

Smart Alerting Strategy

Alert fatigue = biggest monitoring mistake! 1000 alerts — no one cares. Smart alerting setup pannunga:

🔴 Critical (PagerDuty — Wake someone up):

Model accuracy drops >10% in 1 hour
Error rate >5%
All instances down
Inference latency >5s sustained

🟡 Warning (Slack notification):

Accuracy drops >5% in 24 hours
Drift detected (PSI > 0.2)
Latency p99 >2s
Disk usage >80%
Confidence avg drops below 0.6

🔵 Info (Dashboard only):

New model deployed
Retrain job completed
Traffic pattern change

Prometheus alerting rules:

yaml

groups:
  - name: ai_model_alerts
    rules:
      - alert: ModelAccuracyDrop
        expr: model_accuracy < 0.85
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "Model accuracy dropped below 85%!"

      - alert: HighInferenceLatency
        expr: histogram_quantile(0.99, model_inference_seconds_bucket) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency exceeds 2 seconds"

Golden rule: Every alert ku runbook irukanum — alert vandha enna pannanum step-by-step! 📋

Structured Logging for AI

💡 Tip

AI apps ku structured logging MUST:

python

import structlog

logger = structlog.get_logger()

# Every prediction log pannunga
logger.info("prediction_complete",
    request_id="req-abc123",
    model_version="v2.3",
    input_tokens=512,
    latency_ms=45,
    confidence=0.92,
    prediction="positive",
    feature_hash="sha256:abc..."  # Input fingerprint
)

What to log for AI:

✅ Request ID (trace across services)

✅ Model version (which model served)

✅ Input metadata (size, type — NOT actual data!)

✅ Latency breakdown (preprocess, inference, postprocess)

✅ Confidence score

✅ Prediction result

❌ NEVER log PII (names, emails)

❌ NEVER log full input data (privacy + storage cost)

Log aggregation: Loki + Grafana (free) or ELK Stack. Query example:

code

{app="ai-api"} | json | confidence < 0.5

↑ Low confidence predictions filter pannunga — potential issues spot! 🔍

AI Monitoring Tools Comparison

ML-specific monitoring tools:

Tool	Type	Cost	Best For
Prometheus + Grafana	General metrics	Free	Infrastructure + custom
Evidently AI	ML monitoring	Free/Open	Drift detection
WhyLabs	ML observability	Freemium	Full ML monitoring
MLflow	Experiment tracking	Free	Model versioning
Arize AI	ML observability	Paid	Enterprise
Neptune.ai	Experiment tracking	Freemium	Research teams
Datadog	Full stack	Paid	Enterprise
New Relic	APM	Freemium	Application perf

Recommended stack (Free):

📊 Prometheus + Grafana — Metrics & dashboards
📝 Loki — Log aggregation
🔍 Jaeger — Distributed tracing
🤖 Evidently AI — ML drift detection
📦 MLflow — Model tracking

Total cost: $0! Full enterprise-grade monitoring for free! 🎉

AI Monitoring Architecture

🏗️ Architecture Diagram

┌───────────────────────────────────────────────────────┐
│           AI APP MONITORING ARCHITECTURE               │
├───────────────────────────────────────────────────────┤
│                                                         │
│  📱 Users                                               │
│    │ requests                                           │
│    ▼                                                    │
│  ┌──────────────┐                                      │
│  │   AI API     │──── Metrics ────▶ ┌────────────┐    │
│  │  (FastAPI)   │                    │ Prometheus │    │
│  │              │──── Logs ───────▶  │            │    │
│  │  /predict    │                    └─────┬──────┘    │
│  └──────┬───────┘                    ┌─────▼──────┐    │
│         │                            │  Grafana   │    │
│    ┌────▼────┐                       │ Dashboards │    │
│    │  Model  │                       └─────┬──────┘    │
│    │ Server  │                             │           │
│    └────┬────┘                       ┌─────▼──────┐    │
│         │                            │Alertmanager│    │
│    ┌────▼──────────┐                 └──┬────┬────┘    │
│    │ Prediction    │                    │    │         │
│    │ Store (DB)    │              Slack ◀┘    └▶PagerDuty │
│    └────┬──────────┘                                   │
│         │                                              │
│    ┌────▼──────────┐    ┌─────────────┐               │
│    │ Evidently AI  │───▶│ Drift Alert │               │
│    │ (Drift Check) │    │ + Retrain   │               │
│    │  Cron: Daily  │    │   Trigger   │               │
│    └───────────────┘    └─────────────┘               │
│                                                         │
│    ┌─────────────┐    ┌──────────┐                     │
│    │    Loki     │◀───│   Logs   │                     │
│    │(Log Aggreg.)│    │(Structlog)│                    │
│    └──────┬──────┘    └──────────┘                     │
│           └──────────▶ Grafana Explore                  │
│                                                         │
└───────────────────────────────────────────────────────┘

Quick Setup — Docker Compose Stack

✅ Example

One command la full monitoring stack!

yaml

# docker-compose.monitoring.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  loki:
    image: grafana/loki:latest
    ports: ["3100:3100"]

  alertmanager:
    image: prom/alertmanager:latest
    ports: ["9093:9093"]

  ai-api:
    build: ./ai-app
    ports: ["8000:8000"]
    environment:
      - PROMETHEUS_ENABLED=true

bash

docker-compose -f docker-compose.monitoring.yml up -d
# Grafana: http://localhost:3000
# Prometheus: http://localhost:9090

5 minutes la full monitoring stack ready! Import pre-built Grafana dashboards and done! 🚀

Prompt: Design Monitoring System

📋 Copy-Paste Prompt

You are an MLOps Engineer specializing in AI observability.

Design a comprehensive monitoring system for:
- Sentiment analysis API (FastAPI + HuggingFace model)
- 10,000 predictions per hour
- Deployed on Kubernetes (3 replicas)
- Must detect model drift within 24 hours

Provide:
1. Complete Prometheus metrics to expose (Python code)
2. Grafana dashboard JSON with 8 panels
3. Alerting rules (critical + warning)
4. Drift detection pipeline (Evidently AI)
5. Runbook for "Model Accuracy Drop" alert
6. Cost estimation for monitoring infrastructure

Summary

Key takeaways:

✅ Three pillars = Metrics + Logs + Traces — all three venum

✅ AI-specific metrics = Accuracy, drift, confidence — beyond CPU/memory

✅ Model drift = Silent killer — detect with Evidently AI

✅ Prometheus + Grafana = Free, powerful monitoring stack

✅ Smart alerting = Severity levels, runbooks, no alert fatigue

✅ Structured logging = JSON logs, request IDs, model versions

Action item: Unga AI project la Prometheus client add pannunga, 3 custom metrics expose pannunga (latency, predictions count, confidence). Grafana la dashboard create pannunga! 📊

Next article: Scalable AI Architecture — millions of users ku design! 🏗️

🏁 🎮 Mini Challenge

Challenge: Setup Monitoring Dashboard (Prometheus + Grafana)

Real monitoring setup — AI app performance track pannu! 📊

Step 1: Python Flask App with Metrics 🐍

python

from flask import Flask
from prometheus_client import Counter, Histogram, generate_latest

app = Flask(__name__)

# Metrics define
request_count = Counter(
    'ai_predictions_total',
    'Total predictions',
    ['model_name']
)

latency = Histogram(
    'ai_prediction_latency_seconds',
    'Prediction latency',
    ['model_name']
)

@app.route("/predict", methods=["POST"])
def predict():
    with latency.labels(model_name="bert").time():
        # Model inference
        result = model.predict(data)
        request_count.labels(model_name="bert").inc()
        return {"result": result}

@app.route("/metrics")
def metrics():
    return generate_latest()

Step 2: Docker Container 🐳

bash

docker build -t ai-monitoring:latest .
docker run -p 5000:5000 ai-monitoring:latest

Step 3: Prometheus Configuration 🔍

yaml

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'ai-app'
    static_configs:
      - targets: ['localhost:5000']

Step 4: Start Prometheus 🚀

bash

docker run -p 9090:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
# Visit: localhost:9090

Step 5: Grafana Dashboard 📈

bash

docker run -p 3000:3000 grafana/grafana

# Browser: localhost:3000
# Login: admin/admin
# Add Prometheus as data source
# Dashboard create:
#   - Panel 1: Request rate
#   - Panel 2: Latency percentiles
#   - Panel 3: Error rate

Step 6: Alerts Setup 🚨

yaml

# alert.rules
- alert: HighLatency
  expr: ai_prediction_latency_seconds{quantile="0.99"} > 2
  for: 5m
  annotations:
    summary: "High prediction latency"

- alert: HighErrorRate
  expr: rate(ai_errors_total[5m]) > 0.05
  for: 2m
  annotations:
    summary: "Error rate > 5%"

Step 7: Load Test & Monitor 📊

bash

# Generate traffic
ab -n 1000 -c 10 http://localhost:5000/predict

# Monitor in Grafana
# Metrics spike see, performance verify

Completion Time: 2-3 hours

Tools: Prometheus, Grafana, Flask

Production-ready monitoring ⭐

💼 Interview Questions

Q1: RED vs USE metrics — difference? AI apps la which important?

A: RED = Request rate, Error rate, Duration. USE = Utilization, Saturation, Errors. RED for API/service endpoints (perfect for inference). USE for infrastructure (CPU, memory, disk). AI apps: both important — RED track inference quality, USE track resource health.

Q2: Alerting fatigue — too many false alerts problem?

A: Set thresholds carefully (testing through). Composite alerts (multiple conditions). Time-based (peak hours different thresholds). Severity levels (critical vs warning). On-call rotation. Runbook attached (alert triggered, epdhi fix?). Best: business metrics alert (customer impact). Avoid low-level infrastructure alert spam.

Q3: Distributed tracing — why needed large systems?

A: Microservices: request crosses multiple services. Single log line illa — trace entire journey. Tools: Jaeger, Zipkin. Each span: service name, latency, status. Bottleneck identify (slow service). Errors debug (which service fail?). AI systems: model inference span, database span, cache span — identify slow component.

Q4: Model performance monitoring — model drift detect?

A: Production model: baseline accuracy establish. Monitor: prediction confidence, actual vs predicted (if label available after). Accuracy drop → model drift (data distribution change). Solution: retrain model, A/B test new version, implement canary. Critical for long-running AI systems.

Q5: Logging strategy — performance impact?

A: Synchronous logging = slow. Async logging better (background thread). Sampling: 100% request log illa, 10% random sample (balance). Structured logging: JSON (parsing easy). Log level: DEBUG (dev only), INFO (production), ERROR (critical). Too much logging = storage cost, noise. Right balance = visibility without overhead.

Frequently Asked Questions

❓ AI app monitoring normal app monitoring la irundhu eppadi different?

Normal apps — CPU, memory, response time monitor pannunga. AI apps ku EXTRA ah model accuracy, prediction drift, data quality, feature distributions, inference latency — ivangalayum track pannanum. Model silent ah degrade aagum — errors throw pannaadhu but wrong predictions kudukum.

❓ Model drift na enna?

Model drift = Production data training data la irundhu maradhu. Example: COVID time la shopping behavior complete ah change aachu — old models fail aachu. Data drift (input changes) + Concept drift (relationship changes) — rendu type irukku.

❓ Best monitoring stack evadhu AI apps ku?

Prometheus + Grafana = metrics & dashboards. MLflow = model tracking. Evidently AI = data/model drift detection. WhyLabs = production ML monitoring. Start with Prometheus + Grafana — free and powerful.

❓ Monitoring setup panna yevlo time aagum?

Basic metrics (Prometheus + Grafana) — 1-2 days. AI-specific metrics (drift, accuracy) — 1 week. Full observability stack — 2-3 weeks. Start basic, gradually add AI-specific monitoring.

❓ Alerting eppadi setup pannanum?

Prometheus Alertmanager use pannunga. Critical alerts: model accuracy drop >5%, latency >2s, error rate >1%. Warning alerts: drift detected, disk >80%. Slack/PagerDuty ku route pannunga. Alert fatigue avoid pannunga — only actionable alerts!

🧠Knowledge Check

Quiz 1 of 1

Model drift na enna?

0 of 1 answered

← Previous ByteInfrastructure as Code Next Byte →Scalable AI architecture

Courses

Learning Paths

Exam Prep

Monitoring AI apps

Introduction

Three Pillars of Observability

AI-Specific Metrics — What to Track

Prometheus + Grafana Setup

Model Drift — The Silent Killer

AI Monitoring Dashboard Design

Smart Alerting Strategy

Structured Logging for AI

AI Monitoring Tools Comparison

AI Monitoring Architecture

Quick Setup — Docker Compose Stack

Prompt: Design Monitoring System

Summary

🏁 🎮 Mini Challenge

💼 Interview Questions

Frequently Asked Questions