Kafka basics
Introduction
Post office yosichchu paaru 📮 — oru area la letters collect panni, sort panni, different areas ku deliver pannuvaanga. Apache Kafka same concept — but for data! Digital post office 🏤
LinkedIn 2011 la create pannanga — ivanga per day 7 trillion messages process pannuvaanga! Now Uber, Netflix, Airbnb, 80%+ Fortune 500 companies Kafka use pannuvaanga.
Why Kafka is everywhere:
- ⚡ Handle millions of messages/second
- 💾 Messages disk la persist (days/weeks)
- 📈 Horizontally scalable — more brokers add pannunga
- 🔄 Multiple consumers same data read pannalaam
- 🛡️ Fault tolerant — broker crash aanalum data safe
Kafka is not just a tool — it's the central nervous system of modern data platforms! Let's understand how 🧠
Core Concepts
Kafka oda key building blocks:
1. Topic 📋
- Category or feed name — like a folder for messages
- Example: "orders", "payments", "user-events"
- Producers write to topics, consumers read from topics
2. Partition 🗂️
- Topic subdivide aagum into partitions
- Parallelism enable pannum — more partitions = more throughput
- Messages within partition ordered (guaranteed!)
- Across partitions? No ordering guarantee
3. Broker 🖥️
- Kafka server instance
- Cluster = multiple brokers (typically 3+)
- Each broker stores some partitions
4. Producer ✍️
- Application that writes messages to topics
- Decides which partition ku send pannanum
5. Consumer 👀
- Application that reads messages from topics
- Tracks its position (offset) in each partition
6. Consumer Group 👥
- Group of consumers that work together
- Each partition assigned to ONE consumer in the group
- Enables parallel processing!
| Concept | Analogy | Purpose |
|---|---|---|
| Topic | TV Channel | Categorize messages |
| Partition | Lanes in highway | Parallelism |
| Broker | Post office branch | Store & serve |
| Producer | News reporter | Send messages |
| Consumer | TV viewer | Read messages |
| Consumer Group | Family watching together | Load balance |
Kafka Architecture
┌─────────────────────────────────────────────────┐ │ KAFKA CLUSTER │ ├─────────────────────────────────────────────────┤ │ │ │ PRODUCERS BROKERS CONSUMERS │ │ ┌────────┐ ┌────────────────┐ ┌────────────┐│ │ │ App 1 │──▶│ Broker 1 │ │ Consumer ││ │ │ │ │ ┌──────────┐ │──▶│ Group A ││ │ │ App 2 │──▶│ │Topic:orders│ │ │ ┌──┐ ┌──┐ ││ │ │ │ │ │ P0 │ P1 │ │ │ │C1│ │C2│ ││ │ │ App 3 │──▶│ └──────────┘ │ │ └──┘ └──┘ ││ │ └────────┘ ├────────────────┤ ├────────────┤│ │ │ Broker 2 │ │ Consumer ││ │ │ ┌──────────┐ │──▶│ Group B ││ │ │ │Topic:orders│ │ │ ┌──┐ ││ │ │ │ P2 │ P3 │ │ │ │C3│ ││ │ │ └──────────┘ │ │ └──┘ ││ │ ├────────────────┤ └────────────┘│ │ │ Broker 3 │ │ │ │ (replicas) │ │ │ └────────────────┘ │ │ │ │ ┌──────────────────────────────────┐ │ │ │ ZooKeeper / KRaft │ │ │ │ (Cluster metadata management) │ │ │ └──────────────────────────────────┘ │ └─────────────────────────────────────────────────┘
Partitions — The Secret to Scale
Partitions Kafka oda superpower! 💪
How partitioning works:
Partition Key decides which partition:
Why partition key matters:
- Same key = same partition = ordering guaranteed for that key
- User U123 oda orders always order la varum ✅
- Different users parallel ah process aagum ⚡
How many partitions?
- Rule of thumb: target throughput / single consumer throughput
- 100 MB/s venum, consumer 10 MB/s handle pannum → 10 partitions minimum
- More partitions = more parallelism, but more overhead too
- Start with 6-12, increase as needed
Warning: Partition count increase pannalaam, but decrease panna mudiyaadhu! Plan carefully 🎯
Producer Configuration — Get It Right!
Producer configuration production la critical:
💡 acks setting — Most important config!
- acks=0 — Don't wait for broker acknowledgment (fastest, risky)
- acks=1 — Leader acknowledges (good balance)
- acks=all — All replicas acknowledge (safest, slower)
💡 Batching — Don't send one message at a time!
💡 Compression — Reduce network and disk usage:
💡 Idempotent Producer — Prevent duplicates:
💡 Retries — Network blips ku:
Golden Rule: Financial data? acks=all + enable.idempotence=true. Metrics/logs? acks=1 + compression podhum! 🏦
Consumer Groups — Parallel Processing
Consumer groups Kafka la load balancing enable pannudhu:
Scenario: "orders" topic with 4 partitions
1 Consumer in Group:
2 Consumers in Group:
4 Consumers in Group:
5 Consumers in Group:
Key Rules:
- Max useful consumers = partition count
- Multiple consumer groups = multiple independent readers
- Group A reads for analytics, Group B reads for ML — both get ALL messages
- Consumer crash aana? Rebalance — remaining consumers take over its partitions
Offset Management:
- Each consumer tracks offset (position in partition)
- Committed offset = "I've processed up to here"
- Crash aana, restart from last committed offset — no data loss! 🛡️
Kafka with Python — Hands On
Python la Kafka use pannalaam — confluent-kafka library:
Producer:
Consumer:
Simple ah start pannalaam! Docker la Kafka run panni practice pannunga 🐳
Replication — Data Safety
Broker crash aana data lose aagakoodadhu — replication saves us:
Replication Factor = 3 (standard production config)
How it works:
- Producer → Leader ku write
- Leader → Followers ku replicate
- Followers "in-sync" aana → ISR (In-Sync Replica) set la irukku
- Leader crash aana → ISR la irundhu new leader elect aagum ⚡
ISR (In-Sync Replicas):
- Follower leader la irundhu data catch up panniduchaa = "in sync"
min.insync.replicas=2— at least 2 replicas in sync irukanumacks=all+min.insync.replicas=2= data loss almost impossible!
Scenario: Broker 1 crashes
Production config recommendation:
| Setting | Value | Why |
|---|---|---|
| replication.factor | 3 | Standard safety |
| min.insync.replicas | 2 | Tolerate 1 broker down |
| acks | all | No data loss |
ZooKeeper → KRaft Migration
⚠️ ZooKeeper is being deprecated! Kafka 3.5+ la KRaft mode use pannunga.
ZooKeeper problems:
- Separate system maintain pannanum — operational overhead
- Scaling limitations with large clusters
- Metadata operations slow
KRaft (Kafka Raft) benefits:
- ✅ No separate ZooKeeper cluster needed
- ✅ Faster controller failover (seconds vs minutes)
- ✅ Simplified operations
- ✅ Better scalability (millions of partitions)
Migration path:
1. Kafka 3.5+ install pannunga
2. KRaft mode la new clusters start pannunga
3. Existing clusters gradually migrate pannunga
New projects ku: Always KRaft mode use pannunga! ZooKeeper dependency remove pannunga 🎯
Real-World Kafka Use Cases
Companies Kafka eppadi use pannuvaanga 🌍:
LinkedIn (Kafka creators):
- 7 trillion messages/day
- Activity tracking, metrics, logs
- 100+ Kafka clusters
Netflix:
- 1 trillion+ events/day
- Viewing activity → recommendation engine
- Real-time A/B testing results
Uber:
- Trip events, driver location, pricing
- Kafka → Flink → real-time features
- 1000+ microservices Kafka through communicate
Spotify:
- User listening events → discover weekly
- 500 billion events/day
- Real-time music recommendations
Common patterns across companies:
1. 📊 Event sourcing — All state changes as events
2. 🔗 Microservice communication — Async message passing
3. 📈 Real-time analytics — Live dashboards
4. 🤖 ML feature pipelines — Fresh features for models
5. 📝 Audit logs — Compliance and debugging
Kafka Connect — Easy Integrations
Custom code ezhudhaama data move panna — Kafka Connect!
What is it?
- Pre-built connectors for databases, file systems, cloud services
- No coding needed — just configuration!
Source Connectors (Data → Kafka):
Sink Connectors (Kafka → Data):
Popular Connectors:
| Connector | Direction | Use Case |
|---|---|---|
| Debezium MySQL | Source | CDC from MySQL |
| JDBC | Source/Sink | Any SQL database |
| S3 | Sink | Archive to S3 |
| Elasticsearch | Sink | Search index |
| BigQuery | Sink | Analytics warehouse |
Kafka Connect = No-code data integration! 🔌
Kafka Monitoring Essentials
Production Kafka cluster monitor pannanum — key metrics:
Broker Metrics:
- Under-replicated partitions — 0 irukanum, > 0 = problem! 🔴
- Active controller count — Exactly 1 irukanum
- Request latency — Produce/fetch request time
- Disk usage — retention policy based
Producer Metrics:
- Record send rate — Messages per second
- Record error rate — Failed sends
- Batch size avg — Batching efficiency
Consumer Metrics:
- Consumer lag — 🔴 MOST IMPORTANT! Messages behind = lag
- Fetch rate — Consumption speed
- Commit rate — Offset commit frequency
Consumer Lag Formula:
Lag increasing = consumer slow or stuck! Immediate attention venum ⚠️
Tools:
- Kafka UI (open source) — cluster visualization
- Burrow — consumer lag monitoring
- Grafana + Prometheus — dashboards
- Confluent Control Center — commercial
Prompt: Local Kafka Setup
Kafka Best Practices
Production Kafka ku essential practices:
✅ Replication factor 3 — Minimum for production
✅ Use Avro/Protobuf — JSON is slow at scale, schema registry use pannunga
✅ Partition key wisely — Hot partitions avoid pannunga (don't use country as key!)
✅ Monitor consumer lag — Alerting set pannunga, lag > threshold = page
✅ Retention policy — Business need ku match pannunga (7 days common)
✅ Compaction — State topics ku log compaction enable pannunga
✅ Security — SASL/SSL enable pannunga, ACLs configure pannunga
✅ Capacity plan — Disk space = throughput × retention × replication factor
✅ Test failover — Monthly broker restart panni recovery test pannunga
Kafka master pannaa real-time data engineering la unstoppable aaiduveeenga! 🚀
Next: Airflow orchestration — pipelines schedule and manage pannalaam! 📅
✅ Key Takeaways
✅ Kafka Fundamentals — Distributed event streaming platform. Topics (categories), Partitions (parallelism), Brokers (servers), Producers (write), Consumers (read)
✅ Topics & Partitions — Topic messages categorize. Partitions parallel ah process. Same partition messages ordered; across partitions ordering guarantee illa. Partition count throughput affect
✅ Consumer Groups — Multiple consumers work together. Each partition max one consumer per group. Multiple groups independent read. Rebalancing automatic aagum
✅ Durability & Replication — Replication factor 3 standard production. Leader + followers in-sync. Leader crash → follower elected. Data loss protection
✅ Producer Configuration — acks=all (safest, slow), acks=1 (good balance), acks=0 (fastest, risky). Batch, compression, idempotence setup. Financial data acks=all
✅ Offset Management — Consumer position track. Commit offset "processed up to here". Crash → resume from last offset. No manual implementation, Kafka handles
✅ Kafka Streams — Lightweight processing library. Topology building, stateless + stateful operations. Small-scale real-time processing perfect
✅ Monitoring Essential — Consumer lag (critical metric), under-replicated partitions, controller status. Burrow tool consumer lag dedicated monitoring. Lag increasing → escalation
🏁 🎮 Mini Challenge
Challenge: Kafka Topic + Producer + Consumer
Complete Kafka workflow hands-on:
Step 1: Create Topic - 5 min:
Step 2: Producer - 10 min:
Step 3: Consumer Group 1 - 5 min:
Step 4: Consumer Group 2 - 5 min:
- Same code, group.id='order-processor-2' (different group)
- Both groups receive SAME messages independently!
Observe:
- Consumer lag (Kafka lag monitor)
- Same messages different consumers
- Partition assignment (3 partitions, load balanced)
Learning: Kafka flexibility – multiple consumers, persistence, replay capability! 📨
💼 Interview Questions
Q1: Kafka oda magic – traditional queue la irundhu difference?
A: Traditional queue: Message consume aana delete. Kafka: Message disk la persist (days/weeks). Multiple consumers same message read. Offset track pannu. Replay possible! Scalability, durability, flexibility – Kafka advantages! But complexity increase aagum.
Q2: Partitions – eppadi choose pannanum?
A: Throughput calculation first: Need 1M messages/sec, single partition 100K/sec – minimum 10 partitions. Then: Add 2-3x buffer for growth. Partition distribution balanced? Check leadership. Too many (overhead), too few (throughput bottleneck). Monitor and adjust!
Q3: Consumer lag – production alert setup?
A: CRITICAL metric! Lag increasing → consumers slow. Alert: lag > threshold (e.g., 1 hour). Burrow (monitoring tool) use pannu. Investigate: consumer slow, network issue, partition imbalanced? Fix → restart consumer, scale up, optimize code.
Q4: Producer reliability – acks configuration?
A: acks=0 (fastest, risky), acks=1 (leader confirms), acks=all (all replicas confirm = safest, slow). Financial data? acks=all mandatory! Metrics/logs? acks=1 ok. Trade-off: Speed vs safety. Batch size, compression, retries – all producer config tune pannu based on use case!
Q5: ZooKeeper → KRaft migration – business impact?
A: KRaft better (simpler operations). But migration complex – downtime risk. Strategy: New clusters KRaft mode, existing gradual migrate. Kafka 3.5+ already KRaft ready. Performance improvement, operational simplicity gain. Timeline: 2026-27 lah industry mostly KRaft adopt pannuvaanga! 📊
Frequently Asked Questions
Consumer group la 3 consumers irukku, topic la 2 partitions irukku. Enna nadakkum?