โ† Back|DATA-ENGINEERINGโ€บSection 1/18
0 of 18 completed

Kafka basics

Advancedโฑ 16 min read๐Ÿ“… Updated: 2026-02-17

Introduction

Post office yosichchu paaru ๐Ÿ“ฎ โ€” oru area la letters collect panni, sort panni, different areas ku deliver pannuvaanga. Apache Kafka same concept โ€” but for data! Digital post office ๐Ÿค


LinkedIn 2011 la create pannanga โ€” ivanga per day 7 trillion messages process pannuvaanga! Now Uber, Netflix, Airbnb, 80%+ Fortune 500 companies Kafka use pannuvaanga.


Why Kafka is everywhere:

  • โšก Handle millions of messages/second
  • ๐Ÿ’พ Messages disk la persist (days/weeks)
  • ๐Ÿ“ˆ Horizontally scalable โ€” more brokers add pannunga
  • ๐Ÿ”„ Multiple consumers same data read pannalaam
  • ๐Ÿ›ก๏ธ Fault tolerant โ€” broker crash aanalum data safe

Kafka is not just a tool โ€” it's the central nervous system of modern data platforms! Let's understand how ๐Ÿง 

Core Concepts

Kafka oda key building blocks:


1. Topic ๐Ÿ“‹

  • Category or feed name โ€” like a folder for messages
  • Example: "orders", "payments", "user-events"
  • Producers write to topics, consumers read from topics

2. Partition ๐Ÿ—‚๏ธ

  • Topic subdivide aagum into partitions
  • Parallelism enable pannum โ€” more partitions = more throughput
  • Messages within partition ordered (guaranteed!)
  • Across partitions? No ordering guarantee

3. Broker ๐Ÿ–ฅ๏ธ

  • Kafka server instance
  • Cluster = multiple brokers (typically 3+)
  • Each broker stores some partitions

4. Producer โœ๏ธ

  • Application that writes messages to topics
  • Decides which partition ku send pannanum

5. Consumer ๐Ÿ‘€

  • Application that reads messages from topics
  • Tracks its position (offset) in each partition

6. Consumer Group ๐Ÿ‘ฅ

  • Group of consumers that work together
  • Each partition assigned to ONE consumer in the group
  • Enables parallel processing!

ConceptAnalogyPurpose
TopicTV ChannelCategorize messages
PartitionLanes in highwayParallelism
BrokerPost office branchStore & serve
ProducerNews reporterSend messages
ConsumerTV viewerRead messages
Consumer GroupFamily watching togetherLoad balance

Kafka Architecture

๐Ÿ—๏ธ Architecture Diagram
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              KAFKA CLUSTER                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                   โ”‚
โ”‚  PRODUCERS        BROKERS          CONSUMERS      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚  โ”‚ App 1  โ”‚โ”€โ”€โ–ถโ”‚  Broker 1      โ”‚  โ”‚ Consumer   โ”‚โ”‚
โ”‚  โ”‚        โ”‚   โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”€โ”€โ–ถโ”‚ Group A    โ”‚โ”‚
โ”‚  โ”‚ App 2  โ”‚โ”€โ”€โ–ถโ”‚  โ”‚Topic:ordersโ”‚ โ”‚  โ”‚ โ”Œโ”€โ”€โ” โ”Œโ”€โ”€โ” โ”‚โ”‚
โ”‚  โ”‚        โ”‚   โ”‚  โ”‚ P0 โ”‚ P1  โ”‚  โ”‚  โ”‚ โ”‚C1โ”‚ โ”‚C2โ”‚ โ”‚โ”‚
โ”‚  โ”‚ App 3  โ”‚โ”€โ”€โ–ถโ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  โ”‚ โ””โ”€โ”€โ”˜ โ””โ”€โ”€โ”˜ โ”‚โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚               โ”‚  Broker 2      โ”‚  โ”‚ Consumer   โ”‚โ”‚
โ”‚               โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”€โ”€โ–ถโ”‚ Group B    โ”‚โ”‚
โ”‚               โ”‚  โ”‚Topic:ordersโ”‚ โ”‚  โ”‚ โ”Œโ”€โ”€โ”      โ”‚โ”‚
โ”‚               โ”‚  โ”‚ P2 โ”‚ P3  โ”‚  โ”‚  โ”‚ โ”‚C3โ”‚      โ”‚โ”‚
โ”‚               โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  โ”‚ โ””โ”€โ”€โ”˜      โ”‚โ”‚
โ”‚               โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚               โ”‚  Broker 3      โ”‚                 โ”‚
โ”‚               โ”‚  (replicas)    โ”‚                 โ”‚
โ”‚               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 โ”‚
โ”‚                                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚  โ”‚  ZooKeeper / KRaft               โ”‚            โ”‚
โ”‚  โ”‚  (Cluster metadata management)   โ”‚            โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Partitions โ€” The Secret to Scale

Partitions Kafka oda superpower! ๐Ÿ’ช


How partitioning works:

code
Topic: "orders" (4 partitions)

Partition 0: [Order1] [Order5] [Order9]  โ†’ Consumer 1
Partition 1: [Order2] [Order6] [Order10] โ†’ Consumer 2
Partition 2: [Order3] [Order7] [Order11] โ†’ Consumer 3
Partition 3: [Order4] [Order8] [Order12] โ†’ Consumer 4

Partition Key decides which partition:

python
# Same user_id always same partition ku pogum
producer.send('orders', key=user_id, value=order_data)

Why partition key matters:

  • Same key = same partition = ordering guaranteed for that key
  • User U123 oda orders always order la varum โœ…
  • Different users parallel ah process aagum โšก

How many partitions?

  • Rule of thumb: target throughput / single consumer throughput
  • 100 MB/s venum, consumer 10 MB/s handle pannum โ†’ 10 partitions minimum
  • More partitions = more parallelism, but more overhead too
  • Start with 6-12, increase as needed

Warning: Partition count increase pannalaam, but decrease panna mudiyaadhu! Plan carefully ๐ŸŽฏ

Producer Configuration โ€” Get It Right!

๐Ÿ’ก Tip

Producer configuration production la critical:

๐Ÿ’ก acks setting โ€” Most important config!

- acks=0 โ€” Don't wait for broker acknowledgment (fastest, risky)

- acks=1 โ€” Leader acknowledges (good balance)

- acks=all โ€” All replicas acknowledge (safest, slower)

๐Ÿ’ก Batching โ€” Don't send one message at a time!

code
batch.size=16384        # Batch size in bytes
linger.ms=5             # Wait 5ms to batch more messages

๐Ÿ’ก Compression โ€” Reduce network and disk usage:

code
compression.type=lz4    # Best performance
# Options: none, gzip, snappy, lz4, zstd

๐Ÿ’ก Idempotent Producer โ€” Prevent duplicates:

code
enable.idempotence=true

๐Ÿ’ก Retries โ€” Network blips ku:

code
retries=3
retry.backoff.ms=100

Golden Rule: Financial data? acks=all + enable.idempotence=true. Metrics/logs? acks=1 + compression podhum! ๐Ÿฆ

Consumer Groups โ€” Parallel Processing

Consumer groups Kafka la load balancing enable pannudhu:


Scenario: "orders" topic with 4 partitions


1 Consumer in Group:

code
Consumer 1 reads: P0, P1, P2, P3 (all partitions โ€” overloaded!)

2 Consumers in Group:

code
Consumer 1 reads: P0, P1
Consumer 2 reads: P2, P3    (balanced! โœ…)

4 Consumers in Group:

code
Consumer 1: P0
Consumer 2: P1
Consumer 3: P2
Consumer 4: P3    (perfect parallelism! ๐ŸŽฏ)

5 Consumers in Group:

code
Consumer 1: P0
Consumer 2: P1
Consumer 3: P2
Consumer 4: P3
Consumer 5: IDLE โŒ (no partition to read!)

Key Rules:

  • Max useful consumers = partition count
  • Multiple consumer groups = multiple independent readers
  • Group A reads for analytics, Group B reads for ML โ€” both get ALL messages
  • Consumer crash aana? Rebalance โ€” remaining consumers take over its partitions

Offset Management:

  • Each consumer tracks offset (position in partition)
  • Committed offset = "I've processed up to here"
  • Crash aana, restart from last committed offset โ€” no data loss! ๐Ÿ›ก๏ธ

Kafka with Python โ€” Hands On

Python la Kafka use pannalaam โ€” confluent-kafka library:


Producer:

python
from confluent_kafka import Producer
import json

producer = Producer({'bootstrap.servers': 'localhost:9092'})

def delivery_report(err, msg):
    if err:
        print(f'โŒ Delivery failed: {err}')
    else:
        print(f'โœ… Delivered to {msg.topic()} [{msg.partition()}]')

# Send messages
for i in range(100):
    event = {'order_id': i, 'amount': 100 + i, 'user': f'user_{i % 10}'}
    producer.produce(
        topic='orders',
        key=str(event['user']),
        value=json.dumps(event),
        callback=delivery_report
    )
producer.flush()  # Wait for all deliveries

Consumer:

python
from confluent_kafka import Consumer
import json

consumer = Consumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processor',
    'auto.offset.reset': 'earliest'
})
consumer.subscribe(['orders'])

while True:
    msg = consumer.poll(1.0)
    if msg is None:
        continue
    if msg.error():
        print(f'Error: {msg.error()}')
        continue
    
    event = json.loads(msg.value())
    print(f"Order {event['order_id']}: โ‚น{event['amount']}")
    # Process order...

consumer.close()

Simple ah start pannalaam! Docker la Kafka run panni practice pannunga ๐Ÿณ

Replication โ€” Data Safety

Broker crash aana data lose aagakoodadhu โ€” replication saves us:


Replication Factor = 3 (standard production config)

code
Partition 0:
  Broker 1: [Leader]   โ† Producers/Consumers talk to this
  Broker 2: [Follower] โ† Keeps copy
  Broker 3: [Follower] โ† Keeps copy

How it works:

  1. Producer โ†’ Leader ku write
  2. Leader โ†’ Followers ku replicate
  3. Followers "in-sync" aana โ†’ ISR (In-Sync Replica) set la irukku
  4. Leader crash aana โ†’ ISR la irundhu new leader elect aagum โšก

ISR (In-Sync Replicas):

  • Follower leader la irundhu data catch up panniduchaa = "in sync"
  • min.insync.replicas=2 โ€” at least 2 replicas in sync irukanum
  • acks=all + min.insync.replicas=2 = data loss almost impossible!

Scenario: Broker 1 crashes

code
Before: Leader=Broker1, ISR=[Broker1, Broker2, Broker3]
After:  Leader=Broker2, ISR=[Broker2, Broker3]
        (automatic failover! zero data loss โœ…)

Production config recommendation:

SettingValueWhy
replication.factor3Standard safety
min.insync.replicas2Tolerate 1 broker down
acksallNo data loss

ZooKeeper โ†’ KRaft Migration

โš ๏ธ Warning

โš ๏ธ ZooKeeper is being deprecated! Kafka 3.5+ la KRaft mode use pannunga.

ZooKeeper problems:

- Separate system maintain pannanum โ€” operational overhead

- Scaling limitations with large clusters

- Metadata operations slow

KRaft (Kafka Raft) benefits:

- โœ… No separate ZooKeeper cluster needed

- โœ… Faster controller failover (seconds vs minutes)

- โœ… Simplified operations

- โœ… Better scalability (millions of partitions)

Migration path:

1. Kafka 3.5+ install pannunga

2. KRaft mode la new clusters start pannunga

3. Existing clusters gradually migrate pannunga

New projects ku: Always KRaft mode use pannunga! ZooKeeper dependency remove pannunga ๐ŸŽฏ

bash
# KRaft mode la Kafka start
kafka-storage.sh format -t $(kafka-storage.sh random-uuid) \
  -c config/kraft/server.properties
kafka-server-start.sh config/kraft/server.properties

Real-World Kafka Use Cases

โœ… Example

Companies Kafka eppadi use pannuvaanga ๐ŸŒ:

LinkedIn (Kafka creators):

- 7 trillion messages/day

- Activity tracking, metrics, logs

- 100+ Kafka clusters

Netflix:

- 1 trillion+ events/day

- Viewing activity โ†’ recommendation engine

- Real-time A/B testing results

Uber:

- Trip events, driver location, pricing

- Kafka โ†’ Flink โ†’ real-time features

- 1000+ microservices Kafka through communicate

Spotify:

- User listening events โ†’ discover weekly

- 500 billion events/day

- Real-time music recommendations

Common patterns across companies:

1. ๐Ÿ“Š Event sourcing โ€” All state changes as events

2. ๐Ÿ”— Microservice communication โ€” Async message passing

3. ๐Ÿ“ˆ Real-time analytics โ€” Live dashboards

4. ๐Ÿค– ML feature pipelines โ€” Fresh features for models

5. ๐Ÿ“ Audit logs โ€” Compliance and debugging

Kafka Connect โ€” Easy Integrations

Custom code ezhudhaama data move panna โ€” Kafka Connect!


What is it?

  • Pre-built connectors for databases, file systems, cloud services
  • No coding needed โ€” just configuration!

Source Connectors (Data โ†’ Kafka):

json
{
  "name": "mysql-source",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql-server",
    "database.port": "3306",
    "database.user": "kafka-user",
    "database.password": "secret",
    "database.server.name": "myserver",
    "table.include.list": "ecommerce.orders"
  }
}

Sink Connectors (Kafka โ†’ Data):

json
{
  "name": "elasticsearch-sink",
  "config": {
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "topics": "orders",
    "connection.url": "http://elasticsearch:9200",
    "type.name": "_doc"
  }
}

Popular Connectors:

ConnectorDirectionUse Case
Debezium MySQLSourceCDC from MySQL
JDBCSource/SinkAny SQL database
S3SinkArchive to S3
ElasticsearchSinkSearch index
BigQuerySinkAnalytics warehouse

Kafka Connect = No-code data integration! ๐Ÿ”Œ

Kafka Monitoring Essentials

Production Kafka cluster monitor pannanum โ€” key metrics:


Broker Metrics:

  • Under-replicated partitions โ€” 0 irukanum, > 0 = problem! ๐Ÿ”ด
  • Active controller count โ€” Exactly 1 irukanum
  • Request latency โ€” Produce/fetch request time
  • Disk usage โ€” retention policy based

Producer Metrics:

  • Record send rate โ€” Messages per second
  • Record error rate โ€” Failed sends
  • Batch size avg โ€” Batching efficiency

Consumer Metrics:

  • Consumer lag โ€” ๐Ÿ”ด MOST IMPORTANT! Messages behind = lag
  • Fetch rate โ€” Consumption speed
  • Commit rate โ€” Offset commit frequency

Consumer Lag Formula:

code
Lag = Latest Offset - Consumer Committed Offset

Lag increasing = consumer slow or stuck! Immediate attention venum โš ๏ธ


Tools:

  • Kafka UI (open source) โ€” cluster visualization
  • Burrow โ€” consumer lag monitoring
  • Grafana + Prometheus โ€” dashboards
  • Confluent Control Center โ€” commercial

Prompt: Local Kafka Setup

๐Ÿ“‹ Copy-Paste Prompt
You are a Kafka instructor. Help me set up Apache Kafka locally using Docker Compose for learning purposes.

Requirements:
1. Single broker Kafka cluster (KRaft mode, no ZooKeeper)
2. Kafka UI for web-based management
3. Create sample topics: orders, payments, user-events
4. Python producer and consumer scripts
5. Show how to monitor consumer lag

Provide:
- docker-compose.yml
- Topic creation commands
- Python scripts with confluent-kafka library
- Common troubleshooting tips

Explain everything in Tanglish. Assume beginner-level Docker knowledge.

Kafka Best Practices

Production Kafka ku essential practices:


โœ… Replication factor 3 โ€” Minimum for production

โœ… Use Avro/Protobuf โ€” JSON is slow at scale, schema registry use pannunga

โœ… Partition key wisely โ€” Hot partitions avoid pannunga (don't use country as key!)

โœ… Monitor consumer lag โ€” Alerting set pannunga, lag > threshold = page

โœ… Retention policy โ€” Business need ku match pannunga (7 days common)

โœ… Compaction โ€” State topics ku log compaction enable pannunga

โœ… Security โ€” SASL/SSL enable pannunga, ACLs configure pannunga

โœ… Capacity plan โ€” Disk space = throughput ร— retention ร— replication factor

โœ… Test failover โ€” Monthly broker restart panni recovery test pannunga


Kafka master pannaa real-time data engineering la unstoppable aaiduveeenga! ๐Ÿš€


Next: Airflow orchestration โ€” pipelines schedule and manage pannalaam! ๐Ÿ“…

โœ… Key Takeaways

โœ… Kafka Fundamentals โ€” Distributed event streaming platform. Topics (categories), Partitions (parallelism), Brokers (servers), Producers (write), Consumers (read)


โœ… Topics & Partitions โ€” Topic messages categorize. Partitions parallel ah process. Same partition messages ordered; across partitions ordering guarantee illa. Partition count throughput affect


โœ… Consumer Groups โ€” Multiple consumers work together. Each partition max one consumer per group. Multiple groups independent read. Rebalancing automatic aagum


โœ… Durability & Replication โ€” Replication factor 3 standard production. Leader + followers in-sync. Leader crash โ†’ follower elected. Data loss protection


โœ… Producer Configuration โ€” acks=all (safest, slow), acks=1 (good balance), acks=0 (fastest, risky). Batch, compression, idempotence setup. Financial data acks=all


โœ… Offset Management โ€” Consumer position track. Commit offset "processed up to here". Crash โ†’ resume from last offset. No manual implementation, Kafka handles


โœ… Kafka Streams โ€” Lightweight processing library. Topology building, stateless + stateful operations. Small-scale real-time processing perfect


โœ… Monitoring Essential โ€” Consumer lag (critical metric), under-replicated partitions, controller status. Burrow tool consumer lag dedicated monitoring. Lag increasing โ†’ escalation

๐Ÿ ๐ŸŽฎ Mini Challenge

Challenge: Kafka Topic + Producer + Consumer


Complete Kafka workflow hands-on:


Step 1: Create Topic - 5 min:

bash
kafka-topics.sh --create --topic orders --partitions 3 --replication-factor 1 --bootstrap-server localhost:9092

Step 2: Producer - 10 min:

python
from confluent_kafka import Producer
import json

producer = Producer({'bootstrap.servers': 'localhost:9092'})

orders = [
    {'order_id': 1, 'user_id': 100, 'amount': 500},
    {'order_id': 2, 'user_id': 101, 'amount': 750},
    {'order_id': 3, 'user_id': 102, 'amount': 1000},
]

for order in orders:
    producer.produce(
        'orders',
        key=str(order['user_id']),  # Partition key
        value=json.dumps(order)
    )
    print(f"Produced: {order}")

producer.flush()

Step 3: Consumer Group 1 - 5 min:

python
consumer = Consumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processor-1',
    'auto.offset.reset': 'earliest'
})
consumer.subscribe(['orders'])

print("Consumer Group 1 (Analytics):")
while True:
    msg = consumer.poll(1.0)
    if msg:
        print(f"Processed: {msg.value().decode()}")

Step 4: Consumer Group 2 - 5 min:

  • Same code, group.id='order-processor-2' (different group)
  • Both groups receive SAME messages independently!

Observe:

  • Consumer lag (Kafka lag monitor)
  • Same messages different consumers
  • Partition assignment (3 partitions, load balanced)

Learning: Kafka flexibility โ€“ multiple consumers, persistence, replay capability! ๐Ÿ“จ

๐Ÿ’ผ Interview Questions

Q1: Kafka oda magic โ€“ traditional queue la irundhu difference?

A: Traditional queue: Message consume aana delete. Kafka: Message disk la persist (days/weeks). Multiple consumers same message read. Offset track pannu. Replay possible! Scalability, durability, flexibility โ€“ Kafka advantages! But complexity increase aagum.


Q2: Partitions โ€“ eppadi choose pannanum?

A: Throughput calculation first: Need 1M messages/sec, single partition 100K/sec โ€“ minimum 10 partitions. Then: Add 2-3x buffer for growth. Partition distribution balanced? Check leadership. Too many (overhead), too few (throughput bottleneck). Monitor and adjust!


Q3: Consumer lag โ€“ production alert setup?

A: CRITICAL metric! Lag increasing โ†’ consumers slow. Alert: lag > threshold (e.g., 1 hour). Burrow (monitoring tool) use pannu. Investigate: consumer slow, network issue, partition imbalanced? Fix โ†’ restart consumer, scale up, optimize code.


Q4: Producer reliability โ€“ acks configuration?

A: acks=0 (fastest, risky), acks=1 (leader confirms), acks=all (all replicas confirm = safest, slow). Financial data? acks=all mandatory! Metrics/logs? acks=1 ok. Trade-off: Speed vs safety. Batch size, compression, retries โ€“ all producer config tune pannu based on use case!


Q5: ZooKeeper โ†’ KRaft migration โ€“ business impact?

A: KRaft better (simpler operations). But migration complex โ€“ downtime risk. Strategy: New clusters KRaft mode, existing gradual migrate. Kafka 3.5+ already KRaft ready. Performance improvement, operational simplicity gain. Timeline: 2026-27 lah industry mostly KRaft adopt pannuvaanga! ๐Ÿ“Š

Frequently Asked Questions

โ“ Kafka na enna simple ah?
Kafka is a distributed event streaming platform. Simple ah sonna โ€” high-speed message delivery system. Producers messages publish pannuvaanga, consumers read pannuvaanga. Messages disk la store aagum.
โ“ Kafka vs RabbitMQ โ€” enna difference?
Kafka is a log-based system (messages persist, multiple consumers read same message). RabbitMQ is a traditional queue (message consumed = deleted). Kafka is better for high-throughput streaming, RabbitMQ for task queues.
โ“ Kafka free ah?
Apache Kafka is open-source and free! But managed services like Confluent Cloud, AWS MSK cost money. Self-host pannaalum infra cost irukku.
โ“ Kafka learn panna enna prerequisites venum?
Linux basics, networking fundamentals, any programming language (Java/Python preferred). Distributed systems concepts therinjaa bonus. Docker therinjaa local la practice panna easy.
๐Ÿง Knowledge Check
Quiz 1 of 1

Consumer group la 3 consumers irukku, topic la 2 partitions irukku. Enna nadakkum?

0 of 1 answered