menu

Kafka Consumer

Consumer read data from a topic (identified by name):

We have seen how to produce data into a topic, but now we need to read data from the topic and for this we are going to use consumer.

Consumer implement the pull model, that means consumer are going to request data from the kafka broker, and then they will get the response back. It's not the kafka broker pushing data to the consumer, it's a pull model.

Consumer automatically know which broker to read from:

The consumer when need to read data from a partition, consumer will automatically know which broker or kafka server to read from.

In case of broker failures, consumer know how to recover:

In case broker has a failure, the consumer are again very very smart they know how to recover from this.

Data is read in order from low to high offset with in each partitions:

The data being read from all these partitions is going to be read in order from low to high offset (eg. offset 0,1,2,3....so on) with in each partition

In block diagram, we ca see that the first consumer is going to read data in order from topic A partition 0, from the offset 0 all the way to offset 7, same for consumer 2 that is going to be read data in order from partition 1 and in order from partition 3, but remember there is no ordering guarantees across partition 1 and partition 2 because they are different partitions.

The only ordering we have is within each partition.

img not found

Consumer Deserializer

Deserializer indicates how to transform bytes into object/daa:

Consumer read messages and they need to transform these bytes that they receive from kafka into objects or data.

So, we have a key that is a binary format and a value that is also a binary format, which corresponds to the data in your kafka message, and then we need to transform them and put them into an object that our programming language can use.

Deserializer are used on the value and the key of the message:

The consumer has to know in advance, what is the format of your messages. The consumer knows that my key is an integer and there for is going to use an integer deserializer to transform key, that is byte into an integer then the key object back to the integer value. Same for the value the consumer know that it need a deserializer of string type.

Common Deserializers:

String

Int

Float

Avro

Protobuf

The serialization/deserialization type must not change during a topic lifecycle:

It means that within your topic life cycle, as long as your topic is created you can not change the type of data that is being sent by the producers. Otherwise you are going to break your consumers because your consumer going to expect for example integer and string value for key and value respectively, but you are going to change them into float and avro, so this will be a big problem.

If you want to change the data type of your topic, what you have to do is to create a new topic instead.

img not found

Consumer Groups

All the consumer in an application read data as a consumer group:

When we have kafka and we want to scale, we are going to have many consumers in an application and they are going to read data as a group and it is called a consumer group.

Each consumer within a group reads from exclusive partitions:

Let's take an example of a kafka topic with five partitions and then we have a consumer group that is called consumer group application and that consumer group has three consumers. Each consumer with in the group, if they belong to the same group are going to be reading from exclusive partitions.

That means my consumer 1 is going to read from partition 0 and partition 1 consumer 2 is going to read from partition 2 and 3 and consumer 3 is going to read from partition 4.

So, we can see consumer 1,2 and 3 are sharing reads from all the partitions and the read all from a distinct partition. In this way a group is reading the kafka topic as a whole.

img not found

Consumer groups - what if too many consumers?

If you have more consumers than partitions, some consumer will be inactive:

If you have too many consumers in your consumer group, more than partitions.

In this bloch diagram we can see that consumer 4 is going to be inactive, and that means it's just going to be stand by consumer and it's not going to read from any topic partition.

img not found

Multiple consumers on one topic

In kafka it is acceptable to have multiple consumer group on the same topic:

It is completly acceptable to have multiple group on the same topic.

img not found