← Back|SOFTWARE-ENGINEERINGβ€ΊSection 1/16
0 of 16 completed

AI + DevOps integration

Advanced⏱ 15 min readπŸ“… Updated: 2026-02-22

Introduction

Unga monolith app la AI feature add pannirukkeenga β€” login service, payment service, AND AI recommendation ellam oru server la. Oru day AI model update panneenga β€” entire app crash! πŸ’₯


Idhu dhaan monolith + AI problem. AI features resource-hungry, independently scalable, frequently updated. Microservices architecture use panna β€” each service independently live and die pannalam!


Indha article la AI-powered microservices architecture β€” design, communication, deployment, and real-world patterns cover pannrom! 🧩✨

When to Use Microservices for AI

Every project ku microservices thevai illa β€” when it makes sense:


SignalMonolith OK βœ…Microservices Needed 🧩
**Team Size**< 5 developers> 5 developers
**AI Models**1-2 simple models3+ complex models
**Scale**< 10K requests/day> 100K requests/day
**Deploy Frequency**WeeklyDaily/multiple per day
**GPU Needs**No GPUGPU required
**Model Updates**MonthlyWeekly/daily
**Fault Tolerance**Some downtime OKZero downtime required

Migration Path:

code
Stage 1: Monolith (everything together)
    ↓
Stage 2: Modular Monolith (AI in separate module)
    ↓
Stage 3: AI as separate service (2 services)
    ↓
Stage 4: Full microservices (5+ services)

Rule: Don't start with microservices β€” grow into them! Premature microservices = premature complexity! 🎯

AI Service Decomposition

πŸ—οΈ Architecture Diagram
AI app ah microservices ah decompose panradhu:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   API GATEWAY                       β”‚
β”‚  [Kong/Nginx] β€” Auth, Rate Limit, Routing          β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚        β”‚        β”‚        β”‚        β”‚
    β–Ό        β–Ό        β–Ό        β–Ό        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”
β”‚ User β”‚ β”‚Searchβ”‚ β”‚Recom β”‚ β”‚Chat  β”‚ β”‚Noti- β”‚
β”‚ Svc  β”‚ β”‚ Svc  β”‚ β”‚ Svc  β”‚ β”‚ Svc  β”‚ β”‚fica- β”‚
β”‚      β”‚ β”‚      β”‚ β”‚      β”‚ β”‚(LLM) β”‚ β”‚tion  β”‚
β”‚ CRUD β”‚ β”‚Vectorβ”‚ β”‚ ML   β”‚ β”‚      β”‚ β”‚ Svc  β”‚
β”‚      β”‚ β”‚ DB   β”‚ β”‚Model β”‚ β”‚Streamβ”‚ β”‚      β”‚
β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”˜
   β”‚        β”‚        β”‚        β”‚        β”‚
   β–Ό        β–Ό        β–Ό        β–Ό        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”
β”‚Postgrβ”‚ β”‚Pine- β”‚ β”‚Redis β”‚ β”‚Anthroβ”‚ β”‚Redis β”‚
β”‚  es  β”‚ β”‚cone  β”‚ β”‚Cache β”‚ β”‚pic   β”‚ β”‚Queue β”‚
β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜
```

**Service Boundaries (AI-Specific):**
1. **User Service** β€” Auth, profiles (CPU, low resources)
2. **Search Service** β€” Vector search, semantic search (GPU optional, vector DB)
3. **Recommendation Service** β€” ML model inference (GPU, high memory)
4. **Chat Service** β€” LLM integration, streaming (API calls, WebSocket)
5. **Notification Service** β€” Async, queue-based (CPU, low resources)

Each service **own database** own β€” no shared DB! πŸ—„οΈ

Inter-Service Communication

Microservices communication patterns for AI:


1. Synchronous (REST/gRPC) β€” Real-time responses

javascript
// User Service calls Recommendation Service
// REST
const recommendations = await fetch('http://recommendation-svc:3001/recommend', {
  method: 'POST',
  body: JSON.stringify({ userId: '123', limit: 10 }),
});

// gRPC (faster for internal services)
const client = new RecommendationClient('recommendation-svc:50051');
const response = await client.getRecommendations({ userId: '123', limit: 10 });

2. Asynchronous (Message Queue) β€” Heavy AI tasks

javascript
// Producer: API Service
await rabbitMQ.publish('ai-tasks', {
  id: 'section-1',
  type: 'generate-summary',
  documentId: 'doc-456',
  callbackUrl: 'http://api-svc:3000/webhook/summary',
});

// Consumer: AI Service (GPU worker)
rabbitMQ.consume('ai-tasks', async (message) => {
  const summary = await aiModel.summarize(message.documentId);
  await fetch(message.callbackUrl, {
    method: 'POST',
    body: JSON.stringify({ documentId: message.documentId, summary }),
  });
});

3. Event-Driven (Pub/Sub) β€” Reactive updates

javascript
// When user action happens β€” multiple services react
eventBus.publish('user.purchased', { userId: '123', productId: 'abc' });

// Recommendation Service listens
eventBus.subscribe('user.purchased', async (event) => {
  await updateUserPreferences(event.userId, event.productId);
  await retrainModel(event.userId);  // Personalization update
});

// Notification Service also listens
eventBus.subscribe('user.purchased', async (event) => {
  await sendRecommendationEmail(event.userId);
});

PatternLatencyCouplingBest For
**REST**LowTightSimple CRUD
**gRPC**Very LowTightInternal AI calls
**Message Queue**HighLooseHeavy AI tasks
**Event Bus**MediumVery LooseReactive updates

API Gateway for AI Services

AI microservices ku smart API gateway design:


javascript
// Kong/Custom API Gateway config
const routes = {
  // Route based on request type
  '/api/chat': {
    service: 'chat-service',
    rateLimit: { free: 20, pro: 200 },
    timeout: 60000,       // LLM responses take time
    streaming: true,       // SSE support
  },
  '/api/recommend': {
    service: 'recommendation-service',
    rateLimit: { free: 100, pro: 1000 },
    timeout: 5000,
    cache: { ttl: 1800 },  // Cache 30 min
  },
  '/api/search': {
    service: 'search-service',
    rateLimit: { free: 50, pro: 500 },
    timeout: 3000,
    cache: { ttl: 300 },   // Cache 5 min
  },
  '/api/users': {
    service: 'user-service',
    rateLimit: { free: 200, pro: 2000 },
    timeout: 2000,
  },
};

// Smart routing with fallback
async function routeRequest(req) {
  const route = routes[req.path];

  try {
    const response = await callService(route.service, req);
    return response;
  } catch (error) {
    // Circuit breaker β€” AI service down na fallback
    if (error.code === 'SERVICE_UNAVAILABLE') {
      return getFallbackResponse(req.path);
    }
    throw error;
  }
}

Gateway Responsibilities:

  • πŸ” Authentication & Authorization
  • 🚦 Rate Limiting (per tier, per endpoint)
  • πŸ“Š Request Logging & Metrics
  • πŸ”„ Circuit Breaker & Fallback
  • πŸ“¦ Response Caching
  • 🌊 Streaming Support (SSE/WebSocket) for LLMs

Circuit Breaker Pattern for AI

AI services fail aagum β€” circuit breaker protect pannum:


javascript
class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 30000; // 30s
    this.state = 'CLOSED';   // CLOSED β†’ OPEN β†’ HALF_OPEN
    this.failureCount = 0;
    this.lastFailureTime = null;
  }

  async call(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit OPEN β€” using fallback');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
      console.log('πŸ”΄ Circuit OPEN! AI service failing.');
    }
  }
}

// Usage
const aiBreaker = new CircuitBreaker({ failureThreshold: 3 });

app.post('/api/recommend', async (req, res) => {
  try {
    const result = await aiBreaker.call(() =>
      recommendationService.getRecommendations(req.body)
    );
    res.json(result);
  } catch (error) {
    // Fallback: return popular items instead of AI recommendations
    const popular = await getPopularItems();
    res.json({ items: popular, source: 'fallback' });
  }
});

States:

  • 🟒 CLOSED β€” Normal operation
  • πŸ”΄ OPEN β€” AI service down, use fallback
  • 🟑 HALF_OPEN β€” Testing if service recovered

AI services crash aanaalum β€” user experience break aagaadhu! πŸ›‘οΈ

Docker Setup for AI Microservices

AI microservices ku Docker compose:


yaml
# docker-compose.yml
version: '3.8'

services:
  api-gateway:
    build: ./gateway
    ports: ['3000:3000']
    depends_on: [user-svc, recommendation-svc, chat-svc]

  user-svc:
    build: ./services/user
    environment:
      DATABASE_URL: postgres://db:5432/users

  recommendation-svc:
    build: ./services/recommendation
    environment:
      MODEL_PATH: /models/recommender.onnx
      REDIS_URL: redis://cache:6379
    volumes:
      - ./models:/models        # Model files mount
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]  # GPU access!

  chat-svc:
    build: ./services/chat
    environment:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      REDIS_URL: redis://cache:6379

  search-svc:
    build: ./services/search
    environment:
      PINECONE_API_KEY: ${PINECONE_API_KEY}

  # Infrastructure
  db:
    image: postgres:16
    volumes: ['pgdata:/var/lib/postgresql/data']

  cache:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

  queue:
    image: rabbitmq:3-management
    ports: ['15672:15672']  # Management UI

volumes:
  pgdata:

AI Service Dockerfile:

dockerfile
# services/recommendation/Dockerfile
FROM python:3.11-slim

# Install ML dependencies
RUN pip install torch onnxruntime fastapi uvicorn redis

COPY . /app
WORKDIR /app

# Pre-download model
RUN python download_model.py

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

docker compose up β€” whole AI platform start! 🐳

Service Mesh for AI Traffic

πŸ’‘ Tip

Istio/Linkerd service mesh use panna AI traffic manage easy:

yaml
# Istio VirtualService β€” AI traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: recommendation-svc
spec:
  hosts: [recommendation-svc]
  http:
  - route:
    - destination:
        host: recommendation-svc
        subset: v1-current-model
      weight: 90              # 90% old model
    - destination:
        host: recommendation-svc
        subset: v2-new-model
      weight: 10              # 10% new model (canary!)

Service Mesh Benefits for AI:

- πŸ”„ Traffic splitting β€” A/B test models easily

- πŸ”’ mTLS β€” Secure inter-service communication

- πŸ“Š Observability β€” Automatic metrics, tracing

- πŸ” Retry/Timeout β€” Automatic retry for AI failures

- πŸ’€ Fault injection β€” Test AI service failures

Complex setup β€” but large-scale AI systems ku worth it! 🎯

Data Consistency Across AI Services

Microservices la data consistency maintain panradhu challenging:


Saga Pattern for AI Workflows:

javascript
// Example: AI Content Generation Saga
class ContentGenerationSaga {
  async execute(request) {
    const steps = [];

    try {
      // Step 1: Create content record
      const content = await userService.createContent(request);
      steps.push({ service: 'user', action: 'delete', id: content.id });

      // Step 2: Generate AI content
      const aiContent = await chatService.generate(request.prompt);
      steps.push({ service: 'chat', action: 'cleanup', id: aiContent.id });

      // Step 3: Run safety check
      const safetyResult = await moderationService.check(aiContent.text);
      if (!safetyResult.safe) {
        throw new Error('Content failed safety check');
      }

      // Step 4: Store and index
      await searchService.index(content.id, aiContent.text);

      return { success: true, content: aiContent };

    } catch (error) {
      // Compensating transactions β€” rollback!
      for (const step of steps.reverse()) {
        await this.compensate(step);
      }
      return { success: false, error: error.message };
    }
  }

  async compensate(step) {
    console.log(`πŸ”„ Rolling back: ${step.service}.${step.action}`);
    // Each service has its own rollback logic
  }
}

Event Sourcing for AI audit trail:

javascript
// Track every AI decision
eventStore.append('ai-decisions', {
  id: 'section-2',
  type: 'RECOMMENDATION_GENERATED',
  userId: '123',
  model: 'recommender-v2',
  input: features,
  output: recommendations,
  confidence: 0.92,
  timestamp: Date.now(),
});

AI decisions auditable ah irukkanum β€” event sourcing helps! πŸ“‹

Testing AI Microservices

Each service independently test pannunga:


javascript
// Contract test β€” services agree on API shape
// recommendation-svc contract
const contract = {
  request: {
    method: 'POST',
    path: '/recommend',
    body: { userId: 'string', limit: 'number' },
  },
  response: {
    status: 200,
    body: {
      items: [{ id: 'string', score: 'number', title: 'string' }],
      model_version: 'string',
    },
  },
};

// Consumer test (API service tests recommendation contract)
test('recommendation service returns expected shape', async () => {
  const response = await request('http://recommendation-svc:3001')
    .post('/recommend')
    .send({ userId: 'test-user', limit: 5 });

  expect(response.status).toBe(200);
  expect(response.body.items).toBeInstanceOf(Array);
  expect(response.body.items[0]).toHaveProperty('id');
  expect(response.body.items[0]).toHaveProperty('score');
  expect(response.body.model_version).toBeDefined();
});

// Chaos test β€” AI service down scenario
test('system works when AI service is down', async () => {
  await stopService('recommendation-svc');

  const response = await request('http://api-gateway:3000')
    .get('/api/homepage')
    .expect(200);  // Should still work!

  expect(response.body.recommendations.source).toBe('fallback');
  await startService('recommendation-svc');
});

Test Types for AI Microservices:


Test TypeWhat It TestsTools
**Unit**Individual service logicJest, pytest
**Contract**Service API agreementsPact
**Integration**Service interactionsDocker Compose
**Chaos**Failure scenariosChaos Monkey
**Load**Performance at scalek6, Artillery

Observability for AI Microservices

Distributed AI system la what's happening nu therinjukkanum:


Three Pillars:


1. Logs (What happened)

javascript
// Structured logging with correlation ID
logger.info({
  correlationId: req.headers['x-correlation-id'],
  service: 'recommendation-svc',
  action: 'predict',
  userId: req.body.userId,
  modelVersion: 'v2.3',
  latencyMs: 145,
  cacheHit: false,
});

2. Metrics (How much)

javascript
// Prometheus metrics
const inferenceLatency = new Histogram({
  name: 'ai_inference_duration_seconds',
  help: 'AI inference latency',
  labelNames: ['model', 'service'],
  buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});

3. Traces (Request journey across services)

javascript
// OpenTelemetry distributed tracing
const tracer = trace.getTracer('recommendation-svc');

async function getRecommendations(userId) {
  return tracer.startActiveSpan('get-recommendations', async (span) => {
    span.setAttribute('user.id', userId);

    const features = await tracer.startActiveSpan('fetch-features', async (s) => {
      const f = await featureStore.get(userId);
      s.end();
      return f;
    });

    const predictions = await tracer.startActiveSpan('model-predict', async (s) => {
      const p = await model.predict(features);
      s.setAttribute('model.version', 'v2.3');
      s.end();
      return p;
    });

    span.end();
    return predictions;
  });
}

Traces use panna β€” request enda service la slow aagudhu nu immediately find pannalam! πŸ”

AI Microservices Anti-Patterns

⚠️ Warning

⚠️ Avoid these common mistakes:

❌ Distributed Monolith β€” Services tightly coupled, can't deploy independently

βœ… Fix: Each service own database, own deployment pipeline

❌ Chatty Services β€” Too many inter-service calls per request

βœ… Fix: Batch calls, cache responses, use events instead

❌ Shared AI Model β€” Multiple services load same model

βœ… Fix: Dedicated inference service, other services call it

❌ No Fallback β€” AI service down = entire app down

βœ… Fix: Circuit breaker + fallback for every AI dependency

❌ Synchronous Everything β€” All AI calls blocking

βœ… Fix: Queue heavy tasks, webhook for results

❌ Giant AI Service β€” One service does NLP + Vision + Recommendations

βœ… Fix: Separate by AI domain β€” one model per service

Remember: Microservices = independent deployability. If you can't deploy one service without touching others β€” it's NOT microservices! 🎯

βœ… Key Takeaways

βœ… ML lifecycle different software lifecycle β€” model training, evaluation, version management, retraining workflows built-in pannunga


βœ… Container-first approach β€” Docker standardize, reproducible environments, dependencies lock, all platforms consistent aagum


βœ… CI/CD pipelines essential β€” code commit β†’ automated tests β†’ model evaluation β†’ deployment, manual steps minimize pannunga


βœ… Model versioning critical β€” multiple models production, rollback capability, A/B testing, lineage tracking important


βœ… Data validation upstream β€” bad data β†’ bad models β†’ bad predictions, data quality checks early pipeline la implement pannunga


βœ… Monitoring different AI systems β€” accuracy, fairness, latency, drift, retraining signals β€” comprehensive observability necessary


βœ… Infrastructure as code β€” Terraform, Kubernetes YAML version control, reproducible infrastructure, disaster recovery capability


βœ… Secrets management strict β€” API keys, credentials never code la, vault solutions use, rotation policies enforce pannunga

🏁 Mini Challenge

Challenge: Implement Complete DevOps Pipeline for AI App


Oru production-ready DevOps pipeline AI application la setup pannunga (60 mins):


  1. Repository: GitHub repo create panni branching strategy setup pannunga
  2. CI Pipeline: GitHub Actions / GitLab CI setup panni (lint, test, build)
  3. Container: Docker image build panni registry push pannunga
  4. Staging Deployment: Kubernetes or Docker Swarm la staging deploy pannunga
  5. Monitoring: Prometheus metrics, Grafana dashboard setup pannunga
  6. Alerting: Alerts define panni (latency, error rate, resource usage)
  7. CD: Automated deployment to production script panni smoke tests run pannunga

Tools: GitHub Actions, Docker, Kubernetes, Prometheus, Grafana, ArgoCD


Deliverable: Working CI/CD pipeline, production deployment, monitoring setup πŸš€

Interview Questions

Q1: AI applications CI/CD pipeline la special considerations enna?

A: Model versioning (which model deployed?), inference time testing (performance regression check), data quality validation, A/B testing infrastructure. Traditional apps saadhaarana testing sufficient illa, AI apps special requirements have.


Q2: Docker containerization AI models la importance?

A: Reproducibility, dependency management, environment consistency. AI code different servers different results give possible (library versions, hardware). Docker ensures consistency development β†’ production.


Q3: Kubernetes AI workloads scale panna challenges?

A: GPU resource management (expensive, limited), model serving stateless aah maintain panna difficult (model load heavy). Solutions: Kubernetes GPU scheduling, KServe, TorchServe, model caching.


Q4: Model monitoring vs application monitoring – difference?

A: App monitoring: latency, errors, traffic. Model monitoring: prediction accuracy (if available), data drift, model performance degradation. Both important AI systems la!


Q5: Zero-downtime deployment AI models la possible aa?

A: Difficult! Model reload required, which pause cause pannum. Solutions: blue-green deployment, canary release, shadow mode. Careful planning necessary AI model updates rolling out.

Frequently Asked Questions

❓ AI features ku microservices venum ah, monolith podhatha?
Small projects ku monolith fine. But AI inference heavy compute use pannum β€” separate service ah isolate panna independent scaling, deployment, and fault isolation kedaikum.
❓ AI microservice ku gRPC vs REST β€” edhu better?
Internal AI services ku gRPC better β€” faster, typed, streaming support. External/public API ku REST better β€” simple, widely supported. Mix use pannunga.
❓ AI microservices ku Docker venum ah?
Highly recommended! AI models specific dependencies venum (CUDA, PyTorch versions). Docker containerize panna "works on my machine" problem solve aagum. Kubernetes la orchestrate pannunga.
❓ Microservices la AI model update eppdi panradhu?
Blue-green or canary deployment use pannunga. New model version new container la deploy pannunga. Traffic gradually shift pannunga. Rollback easy ah irukkum.
🧠Knowledge Check
Quiz 1 of 1

AI microservice down aana podhu enna pannum?

0 of 1 answered