← Back|SOFTWARE-ENGINEERINGSection 1/16
0 of 16 completed

Microservices + AI

Advanced15 min read📅 Updated: 2026-02-17

Introduction

Unga monolith app la AI feature add pannirukkeenga — login service, payment service, AND AI recommendation ellam oru server la. Oru day AI model update panneenga — entire app crash! 💥


Idhu dhaan monolith + AI problem. AI features resource-hungry, independently scalable, frequently updated. Microservices architecture use panna — each service independently live and die pannalam!


Indha article la AI-powered microservices architecture — design, communication, deployment, and real-world patterns cover pannrom! 🧩✨

When to Use Microservices for AI

Every project ku microservices thevai illa — when it makes sense:


SignalMonolith OK ✅Microservices Needed 🧩
**Team Size**< 5 developers> 5 developers
**AI Models**1-2 simple models3+ complex models
**Scale**< 10K requests/day> 100K requests/day
**Deploy Frequency**WeeklyDaily/multiple per day
**GPU Needs**No GPUGPU required
**Model Updates**MonthlyWeekly/daily
**Fault Tolerance**Some downtime OKZero downtime required

Migration Path:

code
Stage 1: Monolith (everything together)
    ↓
Stage 2: Modular Monolith (AI in separate module)
    ↓
Stage 3: AI as separate service (2 services)
    ↓
Stage 4: Full microservices (5+ services)

Rule: Don't start with microservices — grow into them! Premature microservices = premature complexity! 🎯

AI Service Decomposition

🏗️ Architecture Diagram
AI app ah microservices ah decompose panradhu:

```
┌────────────────────────────────────────────────────┐
│                   API GATEWAY                       │
│  [Kong/Nginx] — Auth, Rate Limit, Routing          │
└───┬────────┬────────┬────────┬────────┬───────────┘
    │        │        │        │        │
    ▼        ▼        ▼        ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ User │ │Search│ │Recom │ │Chat  │ │Noti- │
│ Svc  │ │ Svc  │ │ Svc  │ │ Svc  │ │fica- │
│      │ │      │ │      │ │(LLM) │ │tion  │
│ CRUD │ │Vector│ │ ML   │ │      │ │ Svc  │
│      │ │ DB   │ │Model │ │Stream│ │      │
└──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘
   │        │        │        │        │
   ▼        ▼        ▼        ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Postgr│ │Pine- │ │Redis │ │Anthro│ │Redis │
│  es  │ │cone  │ │Cache │ │pic   │ │Queue │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘
```

**Service Boundaries (AI-Specific):**
1. **User Service** — Auth, profiles (CPU, low resources)
2. **Search Service** — Vector search, semantic search (GPU optional, vector DB)
3. **Recommendation Service** — ML model inference (GPU, high memory)
4. **Chat Service** — LLM integration, streaming (API calls, WebSocket)
5. **Notification Service** — Async, queue-based (CPU, low resources)

Each service **own database** own — no shared DB! 🗄️

Inter-Service Communication

Microservices communication patterns for AI:


1. Synchronous (REST/gRPC) — Real-time responses

javascript
// User Service calls Recommendation Service
// REST
const recommendations = await fetch('http://recommendation-svc:3001/recommend', {
  method: 'POST',
  body: JSON.stringify({ userId: '123', limit: 10 }),
});

// gRPC (faster for internal services)
const client = new RecommendationClient('recommendation-svc:50051');
const response = await client.getRecommendations({ userId: '123', limit: 10 });

2. Asynchronous (Message Queue) — Heavy AI tasks

javascript
// Producer: API Service
await rabbitMQ.publish('ai-tasks', {
  id: 'section-1',
  type: 'generate-summary',
  documentId: 'doc-456',
  callbackUrl: 'http://api-svc:3000/webhook/summary',
});

// Consumer: AI Service (GPU worker)
rabbitMQ.consume('ai-tasks', async (message) => {
  const summary = await aiModel.summarize(message.documentId);
  await fetch(message.callbackUrl, {
    method: 'POST',
    body: JSON.stringify({ documentId: message.documentId, summary }),
  });
});

3. Event-Driven (Pub/Sub) — Reactive updates

javascript
// When user action happens — multiple services react
eventBus.publish('user.purchased', { userId: '123', productId: 'abc' });

// Recommendation Service listens
eventBus.subscribe('user.purchased', async (event) => {
  await updateUserPreferences(event.userId, event.productId);
  await retrainModel(event.userId);  // Personalization update
});

// Notification Service also listens
eventBus.subscribe('user.purchased', async (event) => {
  await sendRecommendationEmail(event.userId);
});

PatternLatencyCouplingBest For
**REST**LowTightSimple CRUD
**gRPC**Very LowTightInternal AI calls
**Message Queue**HighLooseHeavy AI tasks
**Event Bus**MediumVery LooseReactive updates

API Gateway for AI Services

AI microservices ku smart API gateway design:


javascript
// Kong/Custom API Gateway config
const routes = {
  // Route based on request type
  '/api/chat': {
    service: 'chat-service',
    rateLimit: { free: 20, pro: 200 },
    timeout: 60000,       // LLM responses take time
    streaming: true,       // SSE support
  },
  '/api/recommend': {
    service: 'recommendation-service',
    rateLimit: { free: 100, pro: 1000 },
    timeout: 5000,
    cache: { ttl: 1800 },  // Cache 30 min
  },
  '/api/search': {
    service: 'search-service',
    rateLimit: { free: 50, pro: 500 },
    timeout: 3000,
    cache: { ttl: 300 },   // Cache 5 min
  },
  '/api/users': {
    service: 'user-service',
    rateLimit: { free: 200, pro: 2000 },
    timeout: 2000,
  },
};

// Smart routing with fallback
async function routeRequest(req) {
  const route = routes[req.path];

  try {
    const response = await callService(route.service, req);
    return response;
  } catch (error) {
    // Circuit breaker — AI service down na fallback
    if (error.code === 'SERVICE_UNAVAILABLE') {
      return getFallbackResponse(req.path);
    }
    throw error;
  }
}

Gateway Responsibilities:

  • 🔐 Authentication & Authorization
  • 🚦 Rate Limiting (per tier, per endpoint)
  • 📊 Request Logging & Metrics
  • 🔄 Circuit Breaker & Fallback
  • 📦 Response Caching
  • 🌊 Streaming Support (SSE/WebSocket) for LLMs

Circuit Breaker Pattern for AI

AI services fail aagum — circuit breaker protect pannum:


javascript
class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 30000; // 30s
    this.state = 'CLOSED';   // CLOSED → OPEN → HALF_OPEN
    this.failureCount = 0;
    this.lastFailureTime = null;
  }

  async call(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit OPEN — using fallback');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
      console.log('🔴 Circuit OPEN! AI service failing.');
    }
  }
}

// Usage
const aiBreaker = new CircuitBreaker({ failureThreshold: 3 });

app.post('/api/recommend', async (req, res) => {
  try {
    const result = await aiBreaker.call(() =>
      recommendationService.getRecommendations(req.body)
    );
    res.json(result);
  } catch (error) {
    // Fallback: return popular items instead of AI recommendations
    const popular = await getPopularItems();
    res.json({ items: popular, source: 'fallback' });
  }
});

States:

  • 🟢 CLOSED — Normal operation
  • 🔴 OPEN — AI service down, use fallback
  • 🟡 HALF_OPEN — Testing if service recovered

AI services crash aanaalum — user experience break aagaadhu! 🛡️

Docker Setup for AI Microservices

AI microservices ku Docker compose:


yaml
# docker-compose.yml
version: '3.8'

services:
  api-gateway:
    build: ./gateway
    ports: ['3000:3000']
    depends_on: [user-svc, recommendation-svc, chat-svc]

  user-svc:
    build: ./services/user
    environment:
      DATABASE_URL: postgres://db:5432/users

  recommendation-svc:
    build: ./services/recommendation
    environment:
      MODEL_PATH: /models/recommender.onnx
      REDIS_URL: redis://cache:6379
    volumes:
      - ./models:/models        # Model files mount
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]  # GPU access!

  chat-svc:
    build: ./services/chat
    environment:
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      REDIS_URL: redis://cache:6379

  search-svc:
    build: ./services/search
    environment:
      PINECONE_API_KEY: ${PINECONE_API_KEY}

  # Infrastructure
  db:
    image: postgres:16
    volumes: ['pgdata:/var/lib/postgresql/data']

  cache:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

  queue:
    image: rabbitmq:3-management
    ports: ['15672:15672']  # Management UI

volumes:
  pgdata:

AI Service Dockerfile:

dockerfile
# services/recommendation/Dockerfile
FROM python:3.11-slim

# Install ML dependencies
RUN pip install torch onnxruntime fastapi uvicorn redis

COPY . /app
WORKDIR /app

# Pre-download model
RUN python download_model.py

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

docker compose up — whole AI platform start! 🐳

Service Mesh for AI Traffic

💡 Tip

Istio/Linkerd service mesh use panna AI traffic manage easy:

yaml
# Istio VirtualService — AI traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: recommendation-svc
spec:
  hosts: [recommendation-svc]
  http:
  - route:
    - destination:
        host: recommendation-svc
        subset: v1-current-model
      weight: 90              # 90% old model
    - destination:
        host: recommendation-svc
        subset: v2-new-model
      weight: 10              # 10% new model (canary!)

Service Mesh Benefits for AI:

- 🔄 Traffic splitting — A/B test models easily

- 🔒 mTLS — Secure inter-service communication

- 📊 Observability — Automatic metrics, tracing

- 🔁 Retry/Timeout — Automatic retry for AI failures

- 💀 Fault injection — Test AI service failures

Complex setup — but large-scale AI systems ku worth it! 🎯

Data Consistency Across AI Services

Microservices la data consistency maintain panradhu challenging:


Saga Pattern for AI Workflows:

javascript
// Example: AI Content Generation Saga
class ContentGenerationSaga {
  async execute(request) {
    const steps = [];

    try {
      // Step 1: Create content record
      const content = await userService.createContent(request);
      steps.push({ service: 'user', action: 'delete', id: content.id });

      // Step 2: Generate AI content
      const aiContent = await chatService.generate(request.prompt);
      steps.push({ service: 'chat', action: 'cleanup', id: aiContent.id });

      // Step 3: Run safety check
      const safetyResult = await moderationService.check(aiContent.text);
      if (!safetyResult.safe) {
        throw new Error('Content failed safety check');
      }

      // Step 4: Store and index
      await searchService.index(content.id, aiContent.text);

      return { success: true, content: aiContent };

    } catch (error) {
      // Compensating transactions — rollback!
      for (const step of steps.reverse()) {
        await this.compensate(step);
      }
      return { success: false, error: error.message };
    }
  }

  async compensate(step) {
    console.log(`🔄 Rolling back: ${step.service}.${step.action}`);
    // Each service has its own rollback logic
  }
}

Event Sourcing for AI audit trail:

javascript
// Track every AI decision
eventStore.append('ai-decisions', {
  id: 'section-2',
  type: 'RECOMMENDATION_GENERATED',
  userId: '123',
  model: 'recommender-v2',
  input: features,
  output: recommendations,
  confidence: 0.92,
  timestamp: Date.now(),
});

AI decisions auditable ah irukkanum — event sourcing helps! 📋

Testing AI Microservices

Each service independently test pannunga:


javascript
// Contract test — services agree on API shape
// recommendation-svc contract
const contract = {
  request: {
    method: 'POST',
    path: '/recommend',
    body: { userId: 'string', limit: 'number' },
  },
  response: {
    status: 200,
    body: {
      items: [{ id: 'string', score: 'number', title: 'string' }],
      model_version: 'string',
    },
  },
};

// Consumer test (API service tests recommendation contract)
test('recommendation service returns expected shape', async () => {
  const response = await request('http://recommendation-svc:3001')
    .post('/recommend')
    .send({ userId: 'test-user', limit: 5 });

  expect(response.status).toBe(200);
  expect(response.body.items).toBeInstanceOf(Array);
  expect(response.body.items[0]).toHaveProperty('id');
  expect(response.body.items[0]).toHaveProperty('score');
  expect(response.body.model_version).toBeDefined();
});

// Chaos test — AI service down scenario
test('system works when AI service is down', async () => {
  await stopService('recommendation-svc');

  const response = await request('http://api-gateway:3000')
    .get('/api/homepage')
    .expect(200);  // Should still work!

  expect(response.body.recommendations.source).toBe('fallback');
  await startService('recommendation-svc');
});

Test Types for AI Microservices:


Test TypeWhat It TestsTools
**Unit**Individual service logicJest, pytest
**Contract**Service API agreementsPact
**Integration**Service interactionsDocker Compose
**Chaos**Failure scenariosChaos Monkey
**Load**Performance at scalek6, Artillery

Observability for AI Microservices

Distributed AI system la what's happening nu therinjukkanum:


Three Pillars:


1. Logs (What happened)

javascript
// Structured logging with correlation ID
logger.info({
  correlationId: req.headers['x-correlation-id'],
  service: 'recommendation-svc',
  action: 'predict',
  userId: req.body.userId,
  modelVersion: 'v2.3',
  latencyMs: 145,
  cacheHit: false,
});

2. Metrics (How much)

javascript
// Prometheus metrics
const inferenceLatency = new Histogram({
  name: 'ai_inference_duration_seconds',
  help: 'AI inference latency',
  labelNames: ['model', 'service'],
  buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});

3. Traces (Request journey across services)

javascript
// OpenTelemetry distributed tracing
const tracer = trace.getTracer('recommendation-svc');

async function getRecommendations(userId) {
  return tracer.startActiveSpan('get-recommendations', async (span) => {
    span.setAttribute('user.id', userId);

    const features = await tracer.startActiveSpan('fetch-features', async (s) => {
      const f = await featureStore.get(userId);
      s.end();
      return f;
    });

    const predictions = await tracer.startActiveSpan('model-predict', async (s) => {
      const p = await model.predict(features);
      s.setAttribute('model.version', 'v2.3');
      s.end();
      return p;
    });

    span.end();
    return predictions;
  });
}

Traces use panna — request enda service la slow aagudhu nu immediately find pannalam! 🔍

AI Microservices Anti-Patterns

⚠️ Warning

⚠️ Avoid these common mistakes:

Distributed Monolith — Services tightly coupled, can't deploy independently

✅ Fix: Each service own database, own deployment pipeline

Chatty Services — Too many inter-service calls per request

✅ Fix: Batch calls, cache responses, use events instead

Shared AI Model — Multiple services load same model

✅ Fix: Dedicated inference service, other services call it

No Fallback — AI service down = entire app down

✅ Fix: Circuit breaker + fallback for every AI dependency

Synchronous Everything — All AI calls blocking

✅ Fix: Queue heavy tasks, webhook for results

Giant AI Service — One service does NLP + Vision + Recommendations

✅ Fix: Separate by AI domain — one model per service

Remember: Microservices = independent deployability. If you can't deploy one service without touching others — it's NOT microservices! 🎯

Key Takeaways

Start monolith, grow into microservices — premature microservices complexity, monolith first, then decompose when required


Service boundaries clear — each service single responsibility, own database, independent deployment — tight coupling avoid pannunga


Communication patterns important — REST simple, gRPC fast, message queues loose coupling — use case based right pattern choose pannunga


Fault tolerance essential — circuit breaker, fallback mechanisms, service failures isolated, cascade failures prevent pannunga


API gateway hub — auth, rate limiting, routing, caching centralize, backend services complexity reduce pannunga


Data consistency challenging — saga pattern, event sourcing, distributed transactions manage, eventual consistency embrace pannunga


Testing required — contract tests, chaos testing, integration testing essential, deployment confidence high pannunga


Observability critical — logs, metrics, traces distributed system la what happening understand panna necessary, centralized monitoring setup pannunga

🏁 Mini Challenge

Challenge: Build Microservices Architecture with AI


Oru AI-powered microservices system design and implement pannunga (60 mins):


  1. Services: 3-4 independent services design pannunga (API Gateway, NLP service, Image service, Cache service)
  2. Communication: gRPC/REST endpoints define panni OpenAPI spec create panni
  3. Database: Each service own database design panni implement panni
  4. Deployment: Docker containers create panni docker-compose setup pannunga
  5. Monitoring: Service logs, traces, metrics collect panni
  6. Resilience: Circuit breaker, retry logic, fallback implement panni
  7. Load Test: Multiple service requests simulate panni failure scenarios test panni

Tools: Docker, Docker-compose, gRPC/FastAPI, PostgreSQL, Jaeger, Prometheus


Deliverable: Working microservices system, deployment configuration, monitoring setup 🏗️

Interview Questions

Q1: AI service microservices architecture la independent service aah irukka?

A: Should be! NLP, Vision, Recommendation – separate services. Own deployment, own scaling, own team possible. Shared database, tight coupling = distributed monolith, not true microservices.


Q2: Microservices la consistency guarantee panna difficult – solutions?

A: Event-driven architecture, saga pattern for distributed transactions, eventual consistency accept, cache for read consistency. Database-per-service constraint distributed transactions complex make pannum.


Q3: Inter-service communication REST vs gRPC – AI services la which?

A: gRPC AI services la better – low latency (binary protocol), streaming support, strongly typed, better for real-time. REST still fine, gRPC scale panra systems better.


Q4: Service discovery, load balancing enna importance?

A: Service endpoints dynamic change, so discovery needed (Consul, Kubernetes). Load balancing requests distribute pannum multiple instances. Kubernetes both handle pannum automatically.


Q5: Distributed tracing enna important microservices la?

A: Critical debugging – request multi-service traverse pannum, latency problems identify hard. Jaeger, Zipkin distributed tracing provide pannum full request flow visibility, bottleneck quick identification possible.

Frequently Asked Questions

AI features ku microservices venum ah, monolith podhatha?
Small projects ku monolith fine. But AI inference heavy compute use pannum — separate service ah isolate panna independent scaling, deployment, and fault isolation kedaikum.
AI microservice ku gRPC vs REST — edhu better?
Internal AI services ku gRPC better — faster, typed, streaming support. External/public API ku REST better — simple, widely supported. Mix use pannunga.
AI microservices ku Docker venum ah?
Highly recommended! AI models specific dependencies venum (CUDA, PyTorch versions). Docker containerize panna "works on my machine" problem solve aagum. Kubernetes la orchestrate pannunga.
Microservices la AI model update eppdi panradhu?
Blue-green or canary deployment use pannunga. New model version new container la deploy pannunga. Traffic gradually shift pannunga. Rollback easy ah irukkum.
🧠Knowledge Check
Quiz 1 of 1

AI microservice down aana podhu enna pannum?

0 of 1 answered