Dinesh Arney

Posted on Jun 16

Kafka Internal Architecture and Mechanisms

#kafka #distributedsystems #eventdriven #dataengineering

1. What is Kafka

Apache Kafka is that super-fast, super-reliable message board where:

Data (or messages) are written once and can be read many times
It's built to handle millions of messages per second

Apache is the open-source community that gave rise to Kafka.
Apache Kafka is the community version, maintained by volunteers and companies such as LinkedIn, which contributed to its development.

Confluent is a company started by the original creators of Kafka. They offer a commercial version of Kafka with easy-to-use tools, Cloud Services, Support, security, monitoring, connectors, etc.

2. How is Kafka Different from Other Messaging Platforms?

2.1 Design Philosophy and Use Cases
Apache Kafka is built as a distributed event streaming platform, optimised for high-throughput, fault-tolerant, and scalable real-time data pipelines. It's ideal for systems that need to collect, process, and replay streams of data, such as user activity tracking, log aggregation, and real-time analytics.

In contrast, ActiveMQ and RabbitMQ, which follow a traditional message broker model designed for low-latency, point-to-point, or pub-sub communication, are typically used in smaller-scale, transactional applications such as task queues, workflow engines, or enterprise integration.

2.2 Message Storage and Replay
Kafka stores messages persistently on disk and allows consumers to read them multiple times, even from the past, making it perfect for reprocessing data or catching up after failure. Messages are retained based on time or storage size, not consumption status.

On the other hand, RabbitMQ and ActiveMQ treat messages more like one-time delivery letters. Once a consumer acknowledges a message, it is usually removed from the queue, and replay is not supported unless explicitly configured with additional complexity.

2.3 Scalability and Performance
Kafka is designed to scale horizontally with ease. It uses partitioning and distributed coordination to support millions of messages per second across multiple consumers and producers.

In contrast, RabbitMQ and ActiveMQ are more limited in terms of throughput and scalability. While they can be clustered, it requires more effort and is often less reliable at very high volumes.

Kafka excels in large-scale systems where data flows continuously and needs to be consumed by multiple services in parallel.

2.4 Consumer Model and Flexibility
Kafka decouples producers and consumers completely. Multiple consumers can read the same messages independently at their own pace without interfering with each other. This makes it great for event-driven microservices, where each service needs access to the same stream of data.

RabbitMQ and ActiveMQ typically follow a push-based model where each message is delivered to only one consumer by default, unless specific fan-out or pub-sub configurations are used. This makes them better suited for task distribution but less ideal for broadcast-style or multi-subscriber data sharing..

3. Kafka Architecture and Core Components

3.1 Brokers
A Kafka broker is a server process that stores data and serves client requests. A Kafka cluster is a group of brokers working together. Each broker holds some of the partitions for each topic, distributing load and storage across the cluster. Brokers are intentionally kept simple - they only handle reads, writes, and replication, with no additional message processing logic. Data durability and throughput scale by adding more broker instances.

3.2 Topics and Partitions
In Kafka, a topic is a named stream of messages. Each Kafka topic is divided into one or more partitions. The number of partitions is defined at the time of topic creation, and while it can be increased later to support higher throughput or parallelism, it cannot be decreased once it has been set.

Partitions are the unit of parallelism and fault tolerance: each partition is an ordered, append-only sequence of records stored on a single broker (the leader) and replicated to others. This means a topic's data can be spread across brokers for load balancing. Different consumers (in a consumer group) can read from different partitions in parallel. Partitions ensure that even very large data streams are split into manageable chunks across the cluster.

3.3 Producers and Consumers
Producers and consumers are client applications. Producers publish (write) records to topics. They determine which partition within the topic a record goes to (often by hashing a key or using a round-robin approach).

Consumers subscribe to topics to read and process records. Each record in a partition has a sequential offset (a numeric position) that identifies it. Data is written by producers and consumed by consumers across the cluster. Producers are client applications that write into a cluster, and consumers typically join a consumer group for load balancing; Kafka ensures each partition's data is delivered to exactly one consumer in the group. Thus, producers and consumers together form the pub/sub messaging model, with brokers (and partitions) in between.

3.4 ZooKeeper vs. KRaft (Controller) Mode
Historically, Kafka used Apache ZooKeeper to store metadata (broker membership, topic/partition info, controller election). In newer versions (Kafka 3.x+), Kafka can run in KRaft mode which eliminates ZooKeeper. In KRaft mode, the Kafka brokers themselves form an internal Raft quorum (with one broker acting as the controller) to manage cluster metadata.

KRaft uses an event-sourced log (Raft) to replicate metadata across a quorum of brokers. In contrast, ZooKeeper-based Kafka required a separate ZooKeeper ensemble for coordination. Thus, modern Kafka natively manages metadata (leader election, topic config) internally, avoiding the need for a separate system.

4. Message Storage and Retrieval Mechanisms

4.1 Log Segments
Each partition's data is stored on disk as an append-only log segmented into multiple files. Within a partition directory (on the leader broker), Kafka maintains log segment files (e.g., 00000000000000000000.log, 00000000000000001007.log, etc.), plus corresponding index and time-index files. At any time, exactly one segment is active; new messages are appended to the active segment.

When the active segment grows beyond a configured size (log.segment.bytes) or time (log.roll.ms), Kafka closes the current segment and creates a new one. Segmentation allows consumers to skip older segments instead of having to read one large file. Each segment also has an index mapping record, which maps offsets to byte positions (and a time-index mapping, which maps timestamps).

4.2 Message (Record) Format
Kafka writes records (events) to the log. Conceptually, each record has a key, value, and optional timestamp and headers. Both key and value are arbitrary byte arrays (they can be JSON, Avro, protobuf, etc.).
The key may be null if not used. Producers package records into batches for efficiency: multiple records form a batch (with a common compression and attributes) before sending to the broker.
On disk, each record is encoded with a header (including length, attributes, timestamp delta, offset delta, key length/value length, actual key and value bytes, plus any headers).

The offset for each record is its position in the partition (incremented each write). The in-disk format includes these fields so Kafka can efficiently recover or serve data; for example, control records (used in transactions) also advance the offset.

4.3 Offset Tracking
Kafka assigns each record an ever-increasing offset within its partition. Consumers use the offset to resume reading. For durability and group coordination, consumers commit to the offsets they have processed. Committed offsets are stored in the internal __consumer_offsets topic on the brokers. This means that if a consumer group restarts or rebalances, it can resume from the last committed offsets. By default, consumers commit periodically (every 5 seconds by default), though manual commit is also supported.
Only committed messages are guaranteed not to be lost; consumers will only see records up to the last committed offset. Thus, Kafka's use of offsets and the __consumer_offsets topic provides fault-tolerant tracking of what each consumer group has processed.

4.4 Compaction and Retention Policies
Kafka does not keep all records forever. Each topic has a cleanup policy. The default is delete, where Kafka deletes old data once it exceeds a time or size threshold (e.g., retention.ms, retention.bytes).

Under the delete policy, segments beyond the retention period are removed. Kafka also supports log compaction, where records are retained per key: for each key, only the latest value is kept, and older duplicates (or tombstones) are removed. With compaction, Kafka ensures the log has at most one message per key (useful for change-log scenarios).

In practice, brokers roll segments (closing active segments based on size or time) and then apply the cleanup policy: old segments are either deleted or compacted according to the configuration. Compaction and retention ensure that Kafka's log storage remains bounded while providing strong durability semantics, allowing for either time-based persistence (deletion) or compaction to retain the latest state per key.

5. Data Replication and Fault Tolerance

5.1 Partition Replication
Kafka replicates each partition across multiple brokers to ensure fault tolerance and high availability. When a topic's replication factor (RF) is set to N, Kafka maintains N copies of each partition on different brokers.
Among these replicas, one is elected as the leader, and the rest are followers. The leader handles all read and write requests for that partition, while the followers replicate the leader's log asynchronously to stay in sync.

For example, if Partition P has RF=3, and it's assigned to brokers 1, 2, and 3, broker 1 might be elected leader, while brokers 2 and 3 act as followers.
This replication mechanism ensures that if the leader broker fails, one of the in-sync followers (those that have fully caught up with the leader) can be promoted to leader, thereby avoiding data loss and downtime, provided that min.insync.replicas is satisfied and proper acknowledgment settings (acks=all) are used by producers.

5.2 Leader-Follower Mechanism
Kafka uses a leader-follower model for each partition. When a producer sends data, it always goes to the leader of that partition. The leader writes it to its log.

The followers don't get the data directly. Instead, they pull (replicate) the data from the leader, just like a consumer would. They store the same log files and indexes as the leader.

This pull-based system helps followers stay up to date efficiently by fetching data in batches. Each follower is basically a passive backup of the leader. As long as a follower is in sync, it will have exactly the same data as the leader.

5.3 In-Sync Replica (ISR) Set
Kafka keeps track of In-Sync Replicas (ISR) for each partition. These are the follower replicas that have kept up with the leader's log (within a certain time limit).

If a replica lags too far behind (e.g., beyond replica.lag.time.max.ms), the leader removes it from the ISR. Only replicas in the ISR are eligible to become the next leader if the current one fails.

A message is only considered committed when it has been successfully written to all replicas in the ISR. This gives Kafka strong durability guarantees.

As Confluent puts it:

A committed message "will not be lost, as long as there is at least one in-sync replica alive."

Two key settings control this behavior:

acks=all (on the producer): ensures the leader waits for acknowledgment from all ISR members before confirming a write.
min.insync.replicas (on the topic): defines the minimum number of in-sync replicas required for a write to be accepted.

Together, these settings help Kafka ensure that no committed data is lost, even in the event of broker failures.

5.4 Handling Broker Failures
Kafka is designed to handle failures without losing data. Each partition in Kafka has a leader and one or more follower replicas.
When a leader broker fails, Kafka automatically chooses a new leader from the set of in-sync replicas (ISRs) - these are follower replicas that have the latest copy of the data. This election is managed by the Kafka controller (which uses ZooKeeper or KRaft, depending on the setup). Kafka only allows a follower that is still in sync to become the new leader. This helps avoid data loss, but if no in-sync replica is available, the partition becomes temporarily unavailable until a suitable leader is available again. This is called a clean leader election, and it favors data safety over availability.

If a follower fails or falls too far behind the leader, it is removed from the ISR list. The system continues to run with the remaining in-sync replicas. When the failed follower comes back, it catches up with the missing data from the leader and rejoins the ISR. This happens in the background, so the system stays available as long as the minimum number of in-sync replicas is met.
Kafka's configuration settings like min.insync.replicas and the producer setting acks=all play an important role in handling failures safely. They ensure that a message is only considered "committed" when it is written to all in-sync replicas. If the number of in-sync replicas drops below the minimum required, Kafka will reject writes to prevent unprotected data from being accepted. This approach gives Kafka strong durability guarantees while still being able to handle broker failures gracefully.

6. Performance Optimization and Scalability

6.1 Producer Tuning
Kafka producers can be tuned to improve performance by adjusting a few key settings. The batch size (batch.size) and wait time (linger.ms) control how many messages the producer groups together before sending them. Sending larger batches reduces the overhead of network and disk usage, which improves throughput (messages per second). However, it may slightly increase latency, since the producer might wait a few milliseconds to collect enough messages. Enabling compression (using options like gzip, snappy, lz4, or zstd) also helps by reducing the size of data sent over the network and stored on disk. This can significantly boost throughput without much CPU cost. Another important setting is acks. When set to acks=all, it ensures that data is written to all in-sync replicas before confirming success, which provides strong durability but may add some latency. You can also configure retries to handle temporary failures. Together, these settings help you find the right balance between throughput, latency, and reliability based on your application's needs.

6.2 Consumer Tuning
Kafka consumer settings also play a big role in performance. The fetch sizes (fetch.min.bytes and fetch.max.bytes) control how much data a consumer gets from the broker in one request. If you set a higher fetch.min.bytes, the consumer waits to receive a larger chunk of data before responding, which can reduce the number of fetch requests and improve throughput, though it may increase latency slightly. The max.poll.records setting limits how many records a consumer processes in one go. A larger value helps improve throughput by making fewer requests, but it might increase processing time and memory usage. A smaller value reduces the delay in processing, but causes more frequent polling and overhead. To get the best performance, these settings should be tuned based on how fast your application can process messages and how big each message is. Finally, parallel consumption improves speed: you can run multiple consumer threads or instances in a group, ideally matching the number of partitions. This ensures all consumers are doing useful work and no partition is overloaded or left idle.

6.3 Batching and Compression (Overall)
Both producers and brokers benefit from batching. Producers batch records per partition before sending. Brokers also do I/O in segment-size chunks. Careful tuning of batch size, linger time, and compression can often yield 2–5× throughput improvements. For example, enabling LZ4 or ZSTD compression reduces message size dramatically with low CPU overhead. It's generally recommended to experiment with these settings under realistic load to find the sweet spot.

6.4 Partitioning Strategies
How producers assign records to partitions impacts load balance. By default, Kafka uses:
Explicit Partition: If the producer API call specifies a partition number, Kafka writes to that partition.
Keyed Partitioning: If no partition is given but a message key is set, Kafka hashes the key to choose a partition. This ensures all messages with the same key go to the same partition, preserving order per key.
Round-Robin: If neither partition nor key is set, the producer cycles through partitions in a round-robin fashion (though it may batch to one partition at a time for efficiency).

Custom partitioner classes can override this logic (for example, to implement sticky partitions or complex routing). Good partitioning helps balance load across brokers and consumers. For hot keys (skew), increasing the number of partitions can distribute load further.

6.5 Cluster Scaling Strategies:
Kafka is designed to scale out horizontally. You can scale a cluster by adding brokers and redistributing partitions. When new brokers join, you use Kafka's partition reassignment tool (via kafka-reassign-partitions.sh) to move some partitions to the new brokers. Confluent describes this process: you define a JSON plan and execute it, and Kafka rebalances data accordingly. Conversely, you can remove brokers (scale in) by reassigning their partitions to others.
In practice, scaling a cluster often involves:
Increasing the number of partitions in a topic to allow more parallelism (though existing data remains in old partitions).
Running the reassignment tool to evenly distribute partitions/leader roles across brokers.
Monitoring broker load, disk and network I/O, and rebalancing as needed for even resource use.

For very large deployments, tools or managed services can automate this. For example, Confluent Cloud offers auto-scaling (hiding brokers) so you "just request the capacity" and the platform balances the partitions. In self-managed setups, well-planned partition and broker counts (often targeting 1–2M messages/sec per node) and occasional rebalances are key to maintaining performance under growth.

7. Security and Authentication Mechanisms

7.1 Authentication (SASL, SSL/TLS):
Kafka allows you to control who can connect to it by utilising a flexible, plug-and-play security system. Think of it like a secure building that offers multiple ways to verify who's at the door - fingerprints, ID cards, passwords, or access tokens. Kafka provides several authentication options to suit various environments and security requirements.

One of the most commonly used methods is SSL/TLS certificate-based authentication. TLS (Transport Layer Security) is the modern replacement for SSL (Secure Sockets Layer), which is now outdated. TLS helps protect data by encrypting messages as they travel over the network. It also supports certificate-based authentication, where clients must present a digital certificate (like a virtual ID card), and Kafka checks if it's valid. If both client and server validate each other's certificates, it's called mutual TLS - imagine both you and the guard at the gate showing ID cards to ensure mutual trust.

Another method Kafka supports is SASL (Simple Authentication and Security Layer). SASL isn't a login system itself, but a framework that allows you to plug in different authentication mechanisms - much like an adapter for various types of keys.
Kafka supports several SASL mechanisms, including:

PLAIN: Basic username and password
SCRAM-SHA-256/512: A more secure, encrypted password login. SCRAM handles secure password-based login, while TLS ensures the password is encrypted in transit.
GSSAPI (Kerberos): Typically used in enterprise networks, which lets apps like SSH, LDAP, or email clients use Kerberos to securely log in users without needing passwords every time.
OAuth Token Bearer: Token-based login, like signing in with Google

Admins configure these authentication methods by setting up listeners on Kafka brokers. Think of listeners as entry doors, each with different security systems labelled like "SASL_PLAINTEXT" (basic auth, no encryption) or "SASL_SSL" (Auth with encrypted traffic). These allow Kafka to accept and validate clients using your chosen method.

Kafka also ensures that all traffic is encrypted - whether it's communication from clients to Kafka, or between Kafka brokers themselves. Furthermore, for extra security, you can use mutual TLS, where both the client and Kafka broker authenticate each other's certificates. This two-way ID check ensures that both sides can fully trust one another before exchanging any data - like a secure handshake where both parties show trusted credentials.

7.2 Authorization in Kafka (ACLs and RBAC)
Kafka controls who can do what using something called Access Control Lists (ACLs). ACLs enable you to set permissions on items such as topics, consumer groups, or cluster-level actions. You can allow or deny actions such as READ, WRITE, or CREATE for a specific user.
For example, if you want to allow a user named user1 to read from a topic called input-topic1, you would run a command like:
bin/kafka-acls --add --allow-principal User:user1 --operation READ --topic input-topic1
Admins can also set superusers - A special users who are allowed to do everything and are not restricted by ACLs.

In larger organizations, tools like RBAC (Role-Based Access Control) are often used. These systems assign roles (such as "admin" or "reader") to users, and each role has a corresponding set of permissions. But even with RBAC, Kafka still uses ACLs in the background to enforce the actual rules.
By default, Kafka denies any action that doesn't have an explicit ACL allowing it. Therefore, it's essential for administrators to whitelist the users and permissions they want to allow manually.

7.3 Encryption in Kafka (In-Transit and At-Rest)
Kafka protects your data in two key ways: while it's moving (in-transit) and while it's stored (at rest).

In-Transit Encryption
To protect data while it's traveling between producers, brokers, and consumers, Kafka uses SSL/TLS (the same technology used to secure websites). By configuring keystores and truststores on Kafka, you can encrypt all network traffic and optionally verify the identity of who's connecting using certificates. The same setup also secures the communication between Kafka brokers (called inter-broker traffic).

At-Rest Encryption
Kafka does not encrypt the data it stores on disk by default. That means your messages saved in Kafka's log files are in plain text unless you take extra steps. To secure this, you need to use disk-level encryption or file system encryption - like encrypting the hard drive where Kafka stores data. If you're using Kafka in the cloud (like AWS or Confluent Cloud), the platform may offer encrypted storage volumes or tiered storage with encryption built in.