Facts about Kafka every business should know

Ola Puchta-Górska
SoftwareMill Tech Blog
5 min readMar 12, 2020

--

Photo by Markus Spiske on Unsplash

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. It aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds and is capable of handling trillions of events a day.

Apart from high performance, availability and scalability, the reason why Kafka gained so much popularity, is that it benefits greatly from the event-driven architecture. And this type of architecture is perfect as the heart of a system when you need to process huge amounts of data. If you’re looking for an introduction to Apache Kafka and some common use cases read this article.

Companies that use Apache Kafka

According to HG insights more than 18 thousand companies use Kafka including Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, and Netflix. 3729 developers on StackShare have stated that they use Kafka. Among the top reasons why companies and developers use Kafka are the facts that it’s: a high-throughput, distributed, scalable and high-performance platform.

Want to explore particular use cases of companies that leverage Kafka? Read Who and why uses Apache Kafka?

What are the alternatives to Apache Kafka

If you’re looking for a backbone of a distributed messaging system, there are plenty of solutions and it’s worth considering them based on their performance and how easy it is to maintain, deploy and scale them. It all depends on a particular case and your needs.

RabbitMQ is a well-known and popular message broker and it has many powerful features. It is written in Erlang, a programming language that’s well adapted to such tasks. Just like Kafka, RabbitMQ requires you to deploy and manage the software. But it has convenient in-built UI. It has a distinct bounded flow of data, in other words, messages are created, sent and received by the recipient of the consumer of the message. RabbitMQ sends the message to the consumer and the message is removed from the queue once it has been processed and the acknowledgment has arrived. It is ideal for simple use cases, you have certain advantages with low data traffic such as priority queue and flexible routing options. As for abilities to cope with big data loads, here RabbitMQ is inferior to Kafka. Read more.

ActiveMQ is a general-purpose message broker that supports several messaging protocols such as AMQP, STOMP, MQTT. In ActiveMQ, it’s the responsibility of the producers to ensure that messages have been delivered. It cannot ensure that messages are received in the same order they were sent. It is a push-type messaging platform where the providers push the messages to the consumers. There is also no concept of replication. ActiveMQ would be the proper choice especially when exactly-once delivery is needed and messages are valuable (like in financial transactions).

Kafka has really great performance, and supports most of the more complex architectures and isn’t supposed to be as easy to set up as ActiveMQ or RabbitMQ. Kafka is a good choice if you need to process a huge amount of data in real-time as it is highly scalable and doesn’t slow down with the addition of new consumers.

Get “Start with Apache Kafka eBook”

We’ve gathered our lessons learned while consulting clients and using Kafka in commercial projects.

Apache Kafka ecosystem

The Kafka platform consists of Kafka Producer and Consumer API, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry.

The core of Kafka is the brokers, topics, logs, and partitions. Kafka Producer API is used by source applications to generate events and publish to the Kafka cluster at high speed and volume. Applications can use Kafka Consumer API to subscribe to a topic and consume messages as they get published by the producer.

Kafka Stream is the Streams API to consume messages from Kafka, perform any operations and produce output back into Kafka.

Kafka Connect is used for streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka.

The Kafka REST Proxy makes it easy to work with Kafka from any language by providing a RESTful HTTP service for interacting with Kafka clusters.

Schema Registry acts as a central repository for Kafka messages schemas. It manages schemas using Avro for Kafka records.

If you want to dig deeper and learn more about Kafka here is a list of recommended resources.

Apache Kafka License

Although there have been some changes introduced by Confluent to their license model, they didn’t affect Apache Kafka. It’s still open source and available under the Apache 2.0 license. Some other certain features of Confluent Platform are available under the Confluent Community License. It means that you can access their source code and modify or redistribute it; but you cannot use it to make a competing SaaS offering. Read more about the Confluent license details.

Infrastructure for Apache Kafka

When it comes to deployment complexities, there are a number of options to choose from. You may use bare-metal and provision Kafka e.g. with Ansible scripts. There are also Kafka packages for DCOS. However, if you want to choose Kubernetes, then you may leverage e.g. Strimzi. In most cases we would recommend choosing the environment you are already using for your other services. Just remember to leverage multiple Availability Zones to guarantee your cluster availability during failures. Fortunately, there is an option for you even if you don’t want to maintain Kafka in your infrastructure. Multiple companies offer hosted Kafka clusters — Confluent Cloud, Instaclustr, Aiven.

When Apache Kafka is the way to go

To sum up, Kafka is a distributed streaming platform that offers high horizontal scalability. Also, it provides high throughput and that’s why it’s used for real-time data processing.

If you need to make a decision whether Kafka is the best choice for your project, check out the Apache Kafka Implementation Checklist at the end of this article.

Need help with Apache Kafka?

We are a Certified Technical Partner of Confluent. Our engineering expertise with stream processing and distributed systems applications is proven in commercial projects, workshops and consulting.

Contact us

--

--

• Marketing Manager at SoftwareMill • Growbots , Estimote and Webmuses alumna