Getting started with Apache Kafka

Sameer Bhatt
4 min readSep 29, 2021

This is an ultra short hands-on guide to get you started with Apache Kafka.
(Note that there is so much more to Kafka that you can explore & learn)

We will learn -

  1. What is Kafka and its use-cases
  2. Setting up and getting started with Kafka — producing and consuming events
  3. Setting up a Java App for producing and consuming events

What is Kafka

Quite simply, Kafka is a distributed event streaming platform. Now, you might be wondering what is event streaming. Consider various events that happen in real-time — stock exchanges, banks, Shipping, IoT devices. Usually, it’s required to store these events, process them, react to them, or retrieve them later. Event streaming ensures that these events are flowing and are processed continuously, and the right information is at the right place, at the right time.

It’s also used as a foundation for data platforms and event driven architectures and micro-services.
Refer https://kafka.apache.org/ for details.

Many well known companies such as Airbnb, Linkedin, Netflix, Shopify, Zalando, etc.. use it already, some sample use-cases are listed here —
https://kafka.apache.org/uses

Basic Entities in Kafka

There are multiple entities, but the most noteworthy are -

Producer — applications that publish events to Kafka
Consumer — applications that subscribe to and process these events
Topic & Event — An event denotes something happened. Events need to be grouped such that the producer tells which Topic an event should goto and Consumer to make sure to subscribe to the relevant topic to consume events. Topics are usually partitioned or spread over for scalability. Topics are also replicated to make your data fault-tolerant and highly-available.
Broker — There are one of more broker instances running. They are essentially servers that form a storage layer, receiving events from the Producers, storing them and allowing Consumers to retrieve them based on the various parameters such as Topics, Partitions, & Offset.
Zookeeper — Refer here

Setting it up

Download the latest version of Kafka from https://kafka.apache.org/downloads
You can download the binary and unzip it.

Starting up various services (use different terminal windows) -

Start zookeeper
It will start on localhost:2181

./bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka Broker
It will start on localhost:9092

./bin/kafka-server-start.sh config/server.properties

zookeeper.properties and server.properties are default configurations provided in the Kafka package. For this article, we will continue with the default configs.

Note: In case you forcefully terminate Zookeeper, it might give you the following error on launch -
java.io.IOException: No snapshot found, but there are log entries. Something is broken!

To fix this, delete the tmp directory (mac command below) -
(For win, delete the equivalent tmp directory)

rm -rf /tmp/zookeeper/*

Create a Topic

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic my_topic

Describe a Topic

./kafka-topics.sh --describe --topic my_topic --zookeeper localhost:2181

See all Topics

./kafka-topics.sh --list --zookeeper localhost:2181

Start the Producer

./kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic

Start the Consumer

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --from-beginning

That’s it, you can see how any message produced in the producer terminal comes up in the consumer terminal.

Somethings to try

  1. Create a new topic and increment the replication-factor and partitions.
  2. Start more than 1 consumer instances.
  3. Produce messages and see which consumers they are going to.

Setting up the Java Apps

Use your IDE of choice (I used the IntelliJ Community Edition) and create a new Java project (with maven)
In the pom.xml file, add kafka-clients as dependency -

<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.8.0</version>
</dependency>
</dependencies>

Refresh and wait for build changes to take place.
Head to the external packages in the Project view and see kafka-clients listed there. Feel free to explore the various classes here -

Create Producer and Consumer Java Apps

ProducerApp
Here are the high level steps —

  1. Set the configuration in a properties file. That is, the broker server address and key/value serializers.
  2. Instantiate a KafkaProducer Object.
  3. Create ProducerRecords and use the Producer object to send them.
  4. Once done, close the Producer Object.

The code is pretty straightforward here —

public class ProducerApp {    public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props); try {
for (int i = 0; i < 100; i++) {
producer.send(new ProducerRecord<String, String>("my_topic","My Message:", Integer.toString(i)));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
}
}

ConsumerApp
Here are the high level steps —

  1. Set the configuration in a properties file. That is, the broker server address, key/value de-serializer and don’t forget the group id.
  2. Instantiate a KafkaConsumer Object and subscribe to a list of topics.
    Note that you can subscribe to more than one topics.
  3. Poll the Consumer to get ConsumerRecords.
  4. Iterate over the records and read the messages.
  5. Once done reading, commit that we are done reading the specified records (offset).
  6. Finally, close the Consumer Object.

The code is pretty straightforward here as well —

public class ConsumerApp {    public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", "my_group");
KafkaConsumer consumer = new KafkaConsumer(props);
List<String> topics = new ArrayList<String>();
topics.add("my_topic");
consumer.subscribe(topics);
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println(
String.format("Topic: %s, Partition: %d, Offset: %d, Key: %s, Value: %s",
record.topic(), record.partition(), record.offset(), record.key(), record.value())
);
}
consumer.commitAsync();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
consumer.close();
}
}
}

You can keep experimenting with adding more topics, more producers, more consumers and various other parameters.

Hope, this brief hands-on tutorial helped you get started and running with Kafka. As I mentioned earlier, there is a lot more to explore and deep dive into. Happy learning 🙂

--

--

Sameer Bhatt

I write about Technology, Leadership & Life in general. Views are personal.