Kafka Explained

Components of Kafka

Producers: This component is responsible for forwarding messages to the related Kafka component. which is following component.

Topics: This is where producers persist messages. Topics are responsible for storing data inside of Kafka.

Image for post
For each topic, the Kafka cluster maintains a partitioned log that looks like this schema above

Consumers: When you want to read data from one or multiple topics consumers perform read operations from Topics.

Image for post
This scheme above explains read and write operation goes on a partition

Consumer Groups: There is Consumer Group feature for read operations. Lets say you need to read same data from many different systems, you can create consumer groups and you can crate consumers under them so you can decide which consumer will be assigned to which group . It is also possible to scale-out intensive read operation with using consumer groups.

Image for post
Consumer Groups connected to same servers in a cluster

Clusters: as you can understand from the name of the component(just like the previous ones) this component says that Kafka may be clustered. Meaning, Kafka instances which is working in different servers may speak with each other and perform operations in parallel. A Kafka cluster is made up of multiple Kafka Brokers.

Brokers: Brokers store messages sent to the topics and serves consumer requests.

Partitions: Each Topics consist partitions, we can say that partitions are what makes topic a data store.

Replicas: Replication is possible in Kafka to ensure fault-tolerance. Depending on the necessities of your project, replication may be sync or async.

Leaders: Each partition has a leader server. This components conducts each read and write operations as well as they re managing replication processes.

Followers: Followers listen what their leader says. They stores replicated data. if a follower is working in in-sync replication mode, it means it can be next leader in case of failure of leader.

Dependencies

There has to be java environment installed in the servers you want to run Kafka. Java Environment basically governs java infrastructure for any software developed in java language, it makes possible to run java in any OS which supports java.

Another dependency is more tricky, Zookeeper, for service coordination and configuration.

We can understand that those components have to talk to each other to share their states, for example; consumer has to speak with broker to ingest data from topic and it has to share knowledge about operations it made back in broker.

Another example, what happens if the leader in cluster fails? -Zookeeper speaks with other brokers before it goes for leadership election, so Zookeeper handles that kind of coordination tasks.