← Back|DATA-ENGINEERINGSection 1/18
0 of 18 completed

Kafka basics

Advanced16 min read📅 Updated: 2026-02-17

Introduction

Post office yosichchu paaru 📮 — oru area la letters collect panni, sort panni, different areas ku deliver pannuvaanga. Apache Kafka same concept — but for data! Digital post office 🏤


LinkedIn 2011 la create pannanga — ivanga per day 7 trillion messages process pannuvaanga! Now Uber, Netflix, Airbnb, 80%+ Fortune 500 companies Kafka use pannuvaanga.


Why Kafka is everywhere:

  • ⚡ Handle millions of messages/second
  • 💾 Messages disk la persist (days/weeks)
  • 📈 Horizontally scalable — more brokers add pannunga
  • 🔄 Multiple consumers same data read pannalaam
  • 🛡️ Fault tolerant — broker crash aanalum data safe

Kafka is not just a tool — it's the central nervous system of modern data platforms! Let's understand how 🧠

Core Concepts

Kafka oda key building blocks:


1. Topic 📋

  • Category or feed name — like a folder for messages
  • Example: "orders", "payments", "user-events"
  • Producers write to topics, consumers read from topics

2. Partition 🗂️

  • Topic subdivide aagum into partitions
  • Parallelism enable pannum — more partitions = more throughput
  • Messages within partition ordered (guaranteed!)
  • Across partitions? No ordering guarantee

3. Broker 🖥️

  • Kafka server instance
  • Cluster = multiple brokers (typically 3+)
  • Each broker stores some partitions

4. Producer ✍️

  • Application that writes messages to topics
  • Decides which partition ku send pannanum

5. Consumer 👀

  • Application that reads messages from topics
  • Tracks its position (offset) in each partition

6. Consumer Group 👥

  • Group of consumers that work together
  • Each partition assigned to ONE consumer in the group
  • Enables parallel processing!

ConceptAnalogyPurpose
TopicTV ChannelCategorize messages
PartitionLanes in highwayParallelism
BrokerPost office branchStore & serve
ProducerNews reporterSend messages
ConsumerTV viewerRead messages
Consumer GroupFamily watching togetherLoad balance

Kafka Architecture

🏗️ Architecture Diagram
┌─────────────────────────────────────────────────┐
│              KAFKA CLUSTER                        │
├─────────────────────────────────────────────────┤
│                                                   │
│  PRODUCERS        BROKERS          CONSUMERS      │
│  ┌────────┐   ┌────────────────┐  ┌────────────┐│
│  │ App 1  │──▶│  Broker 1      │  │ Consumer   ││
│  │        │   │  ┌──────────┐  │──▶│ Group A    ││
│  │ App 2  │──▶│  │Topic:orders│ │  │ ┌──┐ ┌──┐ ││
│  │        │   │  │ P0 │ P1  │  │  │ │C1│ │C2│ ││
│  │ App 3  │──▶│  └──────────┘  │  │ └──┘ └──┘ ││
│  └────────┘   ├────────────────┤  ├────────────┤│
│               │  Broker 2      │  │ Consumer   ││
│               │  ┌──────────┐  │──▶│ Group B    ││
│               │  │Topic:orders│ │  │ ┌──┐      ││
│               │  │ P2 │ P3  │  │  │ │C3│      ││
│               │  └──────────┘  │  │ └──┘      ││
│               ├────────────────┤  └────────────┘│
│               │  Broker 3      │                 │
│               │  (replicas)    │                 │
│               └────────────────┘                 │
│                                                   │
│  ┌──────────────────────────────────┐            │
│  │  ZooKeeper / KRaft               │            │
│  │  (Cluster metadata management)   │            │
│  └──────────────────────────────────┘            │
└─────────────────────────────────────────────────┘

Partitions — The Secret to Scale

Partitions Kafka oda superpower! 💪


How partitioning works:

code
Topic: "orders" (4 partitions)

Partition 0: [Order1] [Order5] [Order9]  → Consumer 1
Partition 1: [Order2] [Order6] [Order10] → Consumer 2
Partition 2: [Order3] [Order7] [Order11] → Consumer 3
Partition 3: [Order4] [Order8] [Order12] → Consumer 4

Partition Key decides which partition:

python
# Same user_id always same partition ku pogum
producer.send('orders', key=user_id, value=order_data)

Why partition key matters:

  • Same key = same partition = ordering guaranteed for that key
  • User U123 oda orders always order la varum ✅
  • Different users parallel ah process aagum ⚡

How many partitions?

  • Rule of thumb: target throughput / single consumer throughput
  • 100 MB/s venum, consumer 10 MB/s handle pannum → 10 partitions minimum
  • More partitions = more parallelism, but more overhead too
  • Start with 6-12, increase as needed

Warning: Partition count increase pannalaam, but decrease panna mudiyaadhu! Plan carefully 🎯

Producer Configuration — Get It Right!

💡 Tip

Producer configuration production la critical:

💡 acks setting — Most important config!

- acks=0 — Don't wait for broker acknowledgment (fastest, risky)

- acks=1 — Leader acknowledges (good balance)

- acks=all — All replicas acknowledge (safest, slower)

💡 Batching — Don't send one message at a time!

code
batch.size=16384        # Batch size in bytes
linger.ms=5             # Wait 5ms to batch more messages

💡 Compression — Reduce network and disk usage:

code
compression.type=lz4    # Best performance
# Options: none, gzip, snappy, lz4, zstd

💡 Idempotent Producer — Prevent duplicates:

code
enable.idempotence=true

💡 Retries — Network blips ku:

code
retries=3
retry.backoff.ms=100

Golden Rule: Financial data? acks=all + enable.idempotence=true. Metrics/logs? acks=1 + compression podhum! 🏦

Consumer Groups — Parallel Processing

Consumer groups Kafka la load balancing enable pannudhu:


Scenario: "orders" topic with 4 partitions


1 Consumer in Group:

code
Consumer 1 reads: P0, P1, P2, P3 (all partitions — overloaded!)

2 Consumers in Group:

code
Consumer 1 reads: P0, P1
Consumer 2 reads: P2, P3    (balanced! ✅)

4 Consumers in Group:

code
Consumer 1: P0
Consumer 2: P1
Consumer 3: P2
Consumer 4: P3    (perfect parallelism! 🎯)

5 Consumers in Group:

code
Consumer 1: P0
Consumer 2: P1
Consumer 3: P2
Consumer 4: P3
Consumer 5: IDLE ❌ (no partition to read!)

Key Rules:

  • Max useful consumers = partition count
  • Multiple consumer groups = multiple independent readers
  • Group A reads for analytics, Group B reads for ML — both get ALL messages
  • Consumer crash aana? Rebalance — remaining consumers take over its partitions

Offset Management:

  • Each consumer tracks offset (position in partition)
  • Committed offset = "I've processed up to here"
  • Crash aana, restart from last committed offset — no data loss! 🛡️

Kafka with Python — Hands On

Python la Kafka use pannalaam — confluent-kafka library:


Producer:

python
from confluent_kafka import Producer
import json

producer = Producer({'bootstrap.servers': 'localhost:9092'})

def delivery_report(err, msg):
    if err:
        print(f'❌ Delivery failed: {err}')
    else:
        print(f'✅ Delivered to {msg.topic()} [{msg.partition()}]')

# Send messages
for i in range(100):
    event = {'order_id': i, 'amount': 100 + i, 'user': f'user_{i % 10}'}
    producer.produce(
        topic='orders',
        key=str(event['user']),
        value=json.dumps(event),
        callback=delivery_report
    )
producer.flush()  # Wait for all deliveries

Consumer:

python
from confluent_kafka import Consumer
import json

consumer = Consumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processor',
    'auto.offset.reset': 'earliest'
})
consumer.subscribe(['orders'])

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    if msg.error():
        print(f'Error: {msg.error()}')
        continue
    
    event = json.loads(msg.value())
    print(f"Order {event['order_id']}: ₹{event['amount']}")
    # Process order...

consumer.close()

Simple ah start pannalaam! Docker la Kafka run panni practice pannunga 🐳

Replication — Data Safety

Broker crash aana data lose aagakoodadhu — replication saves us:


Replication Factor = 3 (standard production config)

code
Partition 0:
  Broker 1: [Leader]   ← Producers/Consumers talk to this
  Broker 2: [Follower] ← Keeps copy
  Broker 3: [Follower] ← Keeps copy

How it works:

  1. Producer → Leader ku write
  2. Leader → Followers ku replicate
  3. Followers "in-sync" aana → ISR (In-Sync Replica) set la irukku
  4. Leader crash aana → ISR la irundhu new leader elect aagum ⚡

ISR (In-Sync Replicas):

  • Follower leader la irundhu data catch up panniduchaa = "in sync"
  • min.insync.replicas=2 — at least 2 replicas in sync irukanum
  • acks=all + min.insync.replicas=2 = data loss almost impossible!

Scenario: Broker 1 crashes

code
Before: Leader=Broker1, ISR=[Broker1, Broker2, Broker3]
After:  Leader=Broker2, ISR=[Broker2, Broker3]
        (automatic failover! zero data loss ✅)

Production config recommendation:

SettingValueWhy
replication.factor3Standard safety
min.insync.replicas2Tolerate 1 broker down
acksallNo data loss

ZooKeeper → KRaft Migration

⚠️ Warning

⚠️ ZooKeeper is being deprecated! Kafka 3.5+ la KRaft mode use pannunga.

ZooKeeper problems:

- Separate system maintain pannanum — operational overhead

- Scaling limitations with large clusters

- Metadata operations slow

KRaft (Kafka Raft) benefits:

- ✅ No separate ZooKeeper cluster needed

- ✅ Faster controller failover (seconds vs minutes)

- ✅ Simplified operations

- ✅ Better scalability (millions of partitions)

Migration path:

1. Kafka 3.5+ install pannunga

2. KRaft mode la new clusters start pannunga

3. Existing clusters gradually migrate pannunga

New projects ku: Always KRaft mode use pannunga! ZooKeeper dependency remove pannunga 🎯

bash
# KRaft mode la Kafka start
kafka-storage.sh format -t $(kafka-storage.sh random-uuid) \
  -c config/kraft/server.properties
kafka-server-start.sh config/kraft/server.properties

Real-World Kafka Use Cases

Example

Companies Kafka eppadi use pannuvaanga 🌍:

LinkedIn (Kafka creators):

- 7 trillion messages/day

- Activity tracking, metrics, logs

- 100+ Kafka clusters

Netflix:

- 1 trillion+ events/day

- Viewing activity → recommendation engine

- Real-time A/B testing results

Uber:

- Trip events, driver location, pricing

- Kafka → Flink → real-time features

- 1000+ microservices Kafka through communicate

Spotify:

- User listening events → discover weekly

- 500 billion events/day

- Real-time music recommendations

Common patterns across companies:

1. 📊 Event sourcing — All state changes as events

2. 🔗 Microservice communication — Async message passing

3. 📈 Real-time analytics — Live dashboards

4. 🤖 ML feature pipelines — Fresh features for models

5. 📝 Audit logs — Compliance and debugging

Kafka Connect — Easy Integrations

Custom code ezhudhaama data move panna — Kafka Connect!


What is it?

  • Pre-built connectors for databases, file systems, cloud services
  • No coding needed — just configuration!

Source Connectors (Data → Kafka):

json
{
  "name": "mysql-source",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql-server",
    "database.port": "3306",
    "database.user": "kafka-user",
    "database.password": "secret",
    "database.server.name": "myserver",
    "table.include.list": "ecommerce.orders"
  }
}

Sink Connectors (Kafka → Data):

json
{
  "name": "elasticsearch-sink",
  "config": {
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "topics": "orders",
    "connection.url": "http://elasticsearch:9200",
    "type.name": "_doc"
  }
}

Popular Connectors:

ConnectorDirectionUse Case
Debezium MySQLSourceCDC from MySQL
JDBCSource/SinkAny SQL database
S3SinkArchive to S3
ElasticsearchSinkSearch index
BigQuerySinkAnalytics warehouse

Kafka Connect = No-code data integration! 🔌

Kafka Monitoring Essentials

Production Kafka cluster monitor pannanum — key metrics:


Broker Metrics:

  • Under-replicated partitions — 0 irukanum, > 0 = problem! 🔴
  • Active controller count — Exactly 1 irukanum
  • Request latency — Produce/fetch request time
  • Disk usage — retention policy based

Producer Metrics:

  • Record send rate — Messages per second
  • Record error rate — Failed sends
  • Batch size avg — Batching efficiency

Consumer Metrics:

  • Consumer lag — 🔴 MOST IMPORTANT! Messages behind = lag
  • Fetch rate — Consumption speed
  • Commit rate — Offset commit frequency

Consumer Lag Formula:

code
Lag = Latest Offset - Consumer Committed Offset

Lag increasing = consumer slow or stuck! Immediate attention venum ⚠️


Tools:

  • Kafka UI (open source) — cluster visualization
  • Burrow — consumer lag monitoring
  • Grafana + Prometheus — dashboards
  • Confluent Control Center — commercial

Prompt: Local Kafka Setup

📋 Copy-Paste Prompt
You are a Kafka instructor. Help me set up Apache Kafka locally using Docker Compose for learning purposes.

Requirements:
1. Single broker Kafka cluster (KRaft mode, no ZooKeeper)
2. Kafka UI for web-based management
3. Create sample topics: orders, payments, user-events
4. Python producer and consumer scripts
5. Show how to monitor consumer lag

Provide:
- docker-compose.yml
- Topic creation commands
- Python scripts with confluent-kafka library
- Common troubleshooting tips

Explain everything in Tanglish. Assume beginner-level Docker knowledge.

Kafka Best Practices

Production Kafka ku essential practices:


Replication factor 3 — Minimum for production

Use Avro/Protobuf — JSON is slow at scale, schema registry use pannunga

Partition key wisely — Hot partitions avoid pannunga (don't use country as key!)

Monitor consumer lag — Alerting set pannunga, lag > threshold = page

Retention policy — Business need ku match pannunga (7 days common)

Compaction — State topics ku log compaction enable pannunga

Security — SASL/SSL enable pannunga, ACLs configure pannunga

Capacity plan — Disk space = throughput × retention × replication factor

Test failover — Monthly broker restart panni recovery test pannunga


Kafka master pannaa real-time data engineering la unstoppable aaiduveeenga! 🚀


Next: Airflow orchestration — pipelines schedule and manage pannalaam! 📅

Key Takeaways

Kafka Fundamentals — Distributed event streaming platform. Topics (categories), Partitions (parallelism), Brokers (servers), Producers (write), Consumers (read)


Topics & Partitions — Topic messages categorize. Partitions parallel ah process. Same partition messages ordered; across partitions ordering guarantee illa. Partition count throughput affect


Consumer Groups — Multiple consumers work together. Each partition max one consumer per group. Multiple groups independent read. Rebalancing automatic aagum


Durability & Replication — Replication factor 3 standard production. Leader + followers in-sync. Leader crash → follower elected. Data loss protection


Producer Configuration — acks=all (safest, slow), acks=1 (good balance), acks=0 (fastest, risky). Batch, compression, idempotence setup. Financial data acks=all


Offset Management — Consumer position track. Commit offset "processed up to here". Crash → resume from last offset. No manual implementation, Kafka handles


Kafka Streams — Lightweight processing library. Topology building, stateless + stateful operations. Small-scale real-time processing perfect


Monitoring Essential — Consumer lag (critical metric), under-replicated partitions, controller status. Burrow tool consumer lag dedicated monitoring. Lag increasing → escalation

🏁 🎮 Mini Challenge

Challenge: Kafka Topic + Producer + Consumer


Complete Kafka workflow hands-on:


Step 1: Create Topic - 5 min:

bash
kafka-topics.sh --create --topic orders --partitions 3 --replication-factor 1 --bootstrap-server localhost:9092

Step 2: Producer - 10 min:

python
from confluent_kafka import Producer
import json

producer = Producer({'bootstrap.servers': 'localhost:9092'})

orders = [
    {'order_id': 1, 'user_id': 100, 'amount': 500},
    {'order_id': 2, 'user_id': 101, 'amount': 750},
    {'order_id': 3, 'user_id': 102, 'amount': 1000},
]

for order in orders:
    producer.produce(
        'orders',
        key=str(order['user_id']),  # Partition key
        value=json.dumps(order)
    )
    print(f"Produced: {order}")

producer.flush()

Step 3: Consumer Group 1 - 5 min:

python
consumer = Consumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processor-1',
    'auto.offset.reset': 'earliest'
})
consumer.subscribe(['orders'])

print("Consumer Group 1 (Analytics):")
while True:
    msg = consumer.poll(1.0)
    if msg:
        print(f"Processed: {msg.value().decode()}")

Step 4: Consumer Group 2 - 5 min:

  • Same code, group.id='order-processor-2' (different group)
  • Both groups receive SAME messages independently!

Observe:

  • Consumer lag (Kafka lag monitor)
  • Same messages different consumers
  • Partition assignment (3 partitions, load balanced)

Learning: Kafka flexibility – multiple consumers, persistence, replay capability! 📨

💼 Interview Questions

Q1: Kafka oda magic – traditional queue la irundhu difference?

A: Traditional queue: Message consume aana delete. Kafka: Message disk la persist (days/weeks). Multiple consumers same message read. Offset track pannu. Replay possible! Scalability, durability, flexibility – Kafka advantages! But complexity increase aagum.


Q2: Partitions – eppadi choose pannanum?

A: Throughput calculation first: Need 1M messages/sec, single partition 100K/sec – minimum 10 partitions. Then: Add 2-3x buffer for growth. Partition distribution balanced? Check leadership. Too many (overhead), too few (throughput bottleneck). Monitor and adjust!


Q3: Consumer lag – production alert setup?

A: CRITICAL metric! Lag increasing → consumers slow. Alert: lag > threshold (e.g., 1 hour). Burrow (monitoring tool) use pannu. Investigate: consumer slow, network issue, partition imbalanced? Fix → restart consumer, scale up, optimize code.


Q4: Producer reliability – acks configuration?

A: acks=0 (fastest, risky), acks=1 (leader confirms), acks=all (all replicas confirm = safest, slow). Financial data? acks=all mandatory! Metrics/logs? acks=1 ok. Trade-off: Speed vs safety. Batch size, compression, retries – all producer config tune pannu based on use case!


Q5: ZooKeeper → KRaft migration – business impact?

A: KRaft better (simpler operations). But migration complex – downtime risk. Strategy: New clusters KRaft mode, existing gradual migrate. Kafka 3.5+ already KRaft ready. Performance improvement, operational simplicity gain. Timeline: 2026-27 lah industry mostly KRaft adopt pannuvaanga! 📊

Frequently Asked Questions

Kafka na enna simple ah?
Kafka is a distributed event streaming platform. Simple ah sonna — high-speed message delivery system. Producers messages publish pannuvaanga, consumers read pannuvaanga. Messages disk la store aagum.
Kafka vs RabbitMQ — enna difference?
Kafka is a log-based system (messages persist, multiple consumers read same message). RabbitMQ is a traditional queue (message consumed = deleted). Kafka is better for high-throughput streaming, RabbitMQ for task queues.
Kafka free ah?
Apache Kafka is open-source and free! But managed services like Confluent Cloud, AWS MSK cost money. Self-host pannaalum infra cost irukku.
Kafka learn panna enna prerequisites venum?
Linux basics, networking fundamentals, any programming language (Java/Python preferred). Distributed systems concepts therinjaa bonus. Docker therinjaa local la practice panna easy.
🧠Knowledge Check
Quiz 1 of 1

Consumer group la 3 consumers irukku, topic la 2 partitions irukku. Enna nadakkum?

0 of 1 answered