Autoscaling Kafka Streams applications with Kubernetes

Published in

SoftwareMill Tech Blog

6 min readJul 15, 2019

You may already know how to deploy a Kafka Streams app with Kubernetes and configure a proper health check to enable the self healing feature. In this post we’re taking one step forward and look for a way to scale a Kafka Streams app automatically in a smart way.

Scale all the things!

First of all let’s define what it means to scale a Kafka Streams application. Typically it reads from a topic, performs some calculations, transformations and perhaps some side effects to finally write the result back to another topic. To read from a topic it obviously creates a consumer. Scaling such an application is mainly about running more of these consumers. The maximum number of consum ers is determined by the number of the input’s topic partitions. In other words, there is no sense in having more consumers than partitions. If you run another consumer, it will stay idle. To run multiple consumers you simply start another instance of your Kafka Streams application with the same application.id.

Please welcome to the stage — The App

The whole thing is available on Github. Let’s look only at the important parts:

The App does some heavy processing in the peek method. With every message it takes a 10ms power nap.

Let’s give a big round of applause for Apache Kafka

Before we can get to the main topic (no pun intended), we need to prepare the boilerplate — this is starting the broker and creating the required topics:

The inScalingTopic topic has three partitions. As mentioned earlier, it makes sense to have up to three consumers with the same application id.

Let’s start this app with exposed MBeans, which we will look into soon:

If we send a (not-so-)random message (in the form of “valxyz”, where “xyz” is a random integer) to the inScalingTopic topic, it will be dropped by the filter method. If we send a “done” message, we’ll receive it in the outScalingTopic topic:

Let’s run 10000 messages for three different keys, which leaves us with 30k messages:

In the Kafka Streams app logs we’ll notice:

Every ~120 seconds there is a log message telling, that processing a particular key has been finished. All in all it takes ~360 seconds to process 30k messages.

Put your hands together for The MBeans

Let’s bring some MBeans into the game. We’re going to look at one particular metric: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=ks-scaling-app-app-id-*-StreamThread-1-consumer,topic=inScalingTopic,partition=[0,1,2] records-lag. This metric tells us, how far behind the consumer is with fetching messages from the inScalingTopic topic per partition. You can use jconsole, VisualVM or jmxterm to view the value:

record-lag telling how far behind the consumer is

Now we can scale our application manually. Simply run two more instances of it with a different port for the MBean server:

You should notice that each app has its own partition assigned:

Running the Sender again reveals that each instance is working simultaneously picking up messages from one assigned partition. In case where Kafka’s partitioner routes each of the three keys to a different partition we enjoy a fully parallelised stream pipeline:

The Sender was done at 23:05:26,113 and all three instances finished their job after ~120 seconds.

Keep in mind, that the partitioner does not have to route each key to a different partition. It can happen, that partition 0 receives 10k messages with one key, partition 1 gets 20k messages with the other two keys and partition 2 gets none of them. The more messages you send the better the distribution is. For Kafka, these 30k messages are dust in the wind.

To sum up the first part with a one line TL;DR:

Scaling your Kafka Streams application is based on the records-lag metric and a matter of running up to as many instances as the input topic has partitions.

Let’s give a warm welcome to Kubernetes!

Now comes the interesting part. We want to scale the Kafka Streams application automatically. For this we’re going to export the required JMX metrics and let Prometheus pull them for further analysis. Having set up such a monitoring infrastructure, we can configure a HorizontalPodAutoscaler to scale up and down our Kafka Streams app based on a threshold of the records-lag.

First let’s configure The App to be deployed by K8s altogether with a side car exporting the required metrics. For this to happen, we need to dockerise the application in build.gradle.

The docker plugin is configured to run the application with an MBean server exposed on port 5555. The image is pushed to Google’s Container Registry, a private docker repository hosted in the cloud.

Then we need to create a deployment.yaml file letting Kubernetes know what we want to have running:

We expect two containers to be up — our main application, pulled from GCR and additionally an app exposing on port 5556 MBean values in Prometheus’ format, called prometheus-jmx-exporter hosted on Docker Hub.

Note that our app will receive one environment variable, BOOTSTRAP_SERVER_CONFIG, set to my-kafka:9092. This assumes, that you already have a Kubernetes cluster hosting a full blown Kafka setup either on the Google Cloud Platform, the Amazon Elastic Kubernetes Service or in a self hosted environment.

Finally we add Prometheus and Grafana pods through a prometheus operatorhelm chart.

We also need prometheus-adapter, added as a helm chart. It will connect to Prometheus on port 9090 to collect metrics and forward them to Kubernetes as custom metrics.

By default the configuration parameter rules.default is set to true and is therefore omitted. We could customize metrics by choosing just a subset of them or assigning aliases with the rules.custom parameter. Leaving the default causes all custom metrics to be served by Kubernetes at /apis/custom.metrics.k8s.io/v1beta1:

Once everything is up and running we should see the following pods:

Now the final bits. We already know, that the 30k messages could be processed up to three times faster if we only would have three instances of our app running. What we want is to scale up automatically, if the records-lag metric reaches a configurable threshold.

Horizontal Pod Autoscaler to the rescue!

We need to configure a HPA based on a custom metric available in Kubernetes at /apis/custom.metrics.k8s.io/v1beta1:

As long as any records-lag metric is above 10 000, Kubernetes will spawn another instance of our app. The max number is set to 3, since this is the number of partitions the input topic has. Sending 100 000 messages resulted in scaling up the application automatically and later on scaling it down. This is reflected in the Grafana dashboard below:

Kubernetes All The Things!

All our recent projects at SoftwareMill have been orchestrated with Kubernetes. Be it a Kafka cluster through Strimzi, standalone applications, connectors or Kafka Streams applications. You just declare what and don’t care why. You don’t even care if some of your services die in the middle of a night. With the Horizontal Pod Autoscaler you now can read even moar Medium posts resting assured, your apps scale automatically based on a metric.

This post was written together with Grzegorz Kocur, our Kubernetes expert at SoftwareMill.

Grzegorz Kocur — Medium

Read writing from Grzegorz Kocur on Medium. DevOps @ Softwaremill and Editor of SoftwareMill Tech Blog.

Looking for Scala and Java Experts?

Contact us!

We will make technology work for your business. See the projects we have successfully delivered.