Kafka Streams, Jigsaw and Docker walk into a bar

Jaroslaw Kijanowski
SoftwareMill Tech Blog
6 min readJul 18, 2019

--

We typically build Docker images of our applications to let Kubernetes orchestrate and maintain them. Kafka Streams applications are no different in that matter. I’ve already described how we monitor our Kafka Streams applications and let Kubernetes scale them in an automated fashion based on a given metric.

Photo by Clem Onojeghuo on Unsplash

There is one more thing we can do when it comes to providing our devops team with a docker image. Include a native launcher together with an embedded and tailored Java Runtime Environment, not only to save time and space, but also to increase security by including only the bits our app really uses.
I know, it’s 2019 and I’ve seen GraalVM in action. This post however is about an oldie, called Jigsaw, shipped with the JDK since version 9. Because I’m not there yet to explain Java modularity, I just leave links to: an official quickstart and an introduction, as well as a guide to Java 9 modularity, for the curious ones. Also I’ve found this step by step guide on how to migrate an application to Jigsaw very informative.

This post is rather a recipe for creating a Jigsaw module of your Kafka Streams application.

The Good Parts

First let’s prepare a Kafka Streams application and build a docker image. The project is on Github. The main method uses a StreamsBuilder to create a stream that is started later on:

The gradle build file is not fancy at all:

The fatJar task is responsible to build a self executing jar with dependencies. Its size is around 24MB.

Next, we build a docker image with a multistage Docker build. Enjoy the simple Dockerfile, cause the Boogie Man is just around the corner ☠️

Let’s build the image and check its size:

$ docker build -t regular-ks:1.0 -f Dockerfile-regular .
.
Successfully built 9f865cdedefc
Successfully tagged regular-ks:1.0
$ docker images|grep alpine
alpine 3.10 4d90542f0623 2 weeks ago 5.58MB
openjdk 12-alpine 0c68e7c5b7a0 4 months ago 339MB
$ docker images|grep regular
regular-ks 1.0 9f865cdedefc 9 seconds ago 364MB

A plain Alpine Linux image weights about 6MB. The jdk12 image has already 339MB. Finally our image is a whopping 364MB.

Ship It! Out of sight, out of love.

Let’ see if this process could be improved. Project Jigsaw does to our Java code, what a build system, like maven or gradle, does to the class path — it allows to declare dependencies to build an isolated module. Having that information the jlink tool is able to remove parts of the JRE that are never used. These dependencies are specified in a file named module-info.java placed in the source root directory:

In our case we only need a logging framework and the Kafka Streams libraries. Let’s adjust the build file. We don’t use the fatJar task any longer and have to add the --module-path compiler argument specifying the location of our modular application.

The libs task is just a helper to copy all dependencies into a directory later referenced by the --module-path argument.

We can build and run the modular version of our app:

$ ./gradlew clean build jar libs$ java — module-path build/libs:modlibs -m ks.jlink/ks.jlink.App

There are two locations where our modules are placed. In build/libs are all dependencies and modlibs stores our application module. But wait, are all our dependencies Jigsaw modules? Unfortunately not and here it gets ugly 🙈

The Bad Parts

When you look into the slf4j jars, you’ll notice a META-INF/versions.9/module-info.class file. The slf4j library has been properly modularised since version 1.8.0 and there’s nothing we have to do here.
On the other hand, our core dependency org.apache.kafka:kafka-streams:2.3.0 is not a Jigsaw module, nor are its dependencies. When put on the module path, they become automatic modules requiring access to the whole Java Runtime Environment. This prevents us to build a tailored JRE.

Let’s try anyway, get into trouble and eventually fix it. Again, jlink is a tool shipped with the JDK which allows to bundle the application with all its dependencies together with a stripped JRE and a native launcher script:

A brief introduction to jlink and all the particular command line arguments are explained very well in this blog post.

Running this task ends up in a build failure:

$ ./gradlew jlink> Task :jlink FAILED
Error: automatic module cannot be used with jlink: com.fasterxml.jackson.core from file:///…/kafka-streams-jlink/build/libs/jackson-core-2.9.9.jar

There is no point in building a tailored JRE with automatic modules, since they require the whole experience. We need to convert all automatic modules. The jdeps tool can be used to generate a module-info.java file for every particular jar. Then the jar has to be rebuilt and made available as a real module. I’ve provided a shell script which does all of this for every automatic module. Let’s introduce it into the build script:

And execute:

$ ./gradlew clean libs convertAutomaticModules jar jlink
$ ./dist/bin/start
[main] INFO ks.jlink.App — starting Kafka Streams Application

Perfect! The dist directory contains all files required to launch our application and the native launcher is located in bin.

It turns out, there is a gradle jlink plugin we can use for this purpose:

We no longer need the jlink, libs and convertAutomaticModules tasks and the jar task can be simplified. The jlink plugin is configured to create a launcher called start and further options are provided to create a compressed and small image. Let’s run it:

$ ./gradlew jlink

BUILD SUCCESSFUL in 34s
8 actionable tasks: 8 executed
$ ./build/image/bin/start
[main] INFO ks.jlink.App — starting Kafka Streams Application

The Badass JLink Gradle Plugin is indeed a real badass! A lot of magic happens behind the scene which basically replaces the custom shell script.

There is light at the end of the tunnel… 🚂

All we need is to create a docker image to send it to the devops team. Again a multistage Docker build should do the trick:

$ docker build -t jlinked-ks:2.0 -f Dockerfile-jlinked .

Successfully built 7dc7f361f4f7
Successfully tagged jlinked-ks:2.0
$ docker run 7dc7f361f4f7
/app/bin/start: line 3: ./java: not found

It couldn’t have worked out of the box, could it?

The file is there, just change the ENDPOINT to ["ls", "-al", "/app/bin/"], rebuild and run it and you’ll get:

To keep it short, the JVM produced by jlink in the first gradle:5.5.0-jdk12 container does not work in the second alpine:3.10 container. As reported in this issue and explained in this blog post we need to adjust the build script for the second image as does the AdoptOpenJDK project in its Dockerfile:

This is it!

$ docker build -t jlinked-ks:2.0 -f Dockerfile-jlinked .
Successfully built 0ee98987d21c
Successfully tagged jlinked-ks:2.0
$ docker images | grep jlinked
jlinked-ks 2.0 0ee98987d21c 4 minutes ago 74MB
$ docker run 0ee98987d21c
[main] INFO ks.jlink.App - starting Kafka Streams Application

Running an instance of this image and accessing a Kafka cluster running on the docker host machine is a topic for another blog post. What is most important, we’ve reduced the image size from 364MB to 74MB.

Shrinking the deliverables is rather an optimisation technique than a business feature clients explicitly pay for. When used at scale it may pay off in lower bills for image storage and bandwidth usage. On the other hand, Docker images are built up on layers. When bundling a deliverable, every application will reuse common layers containing the operating system and the Java Runtime Environment. Additionally, for each application, there will be only one more, rather small, layer with the fat jar. Using the approach based on Jigsaw and jlink, every application will have its own image containing the application’s jar file, modular dependencies and a tailored JRE. At some point these per-application-tailored layers may grow larger than fat-jar layers. To make things worse, every change in the code produces a 74MB self contained bundle, compared to a 25MB fat jar. Therefore we have to tidy up old images more frequently.

As seen, building a modular Kafka Streams application is possible. Intermediate steps are required to execute custom code, until not all dependencies will be Jigsaw-enabled. You also have to repeat yourself, when declaring these dependencies. Once in the build system and then again in the module info file. Feel free to share your thoughts on modular Kafka Streams applications, especially what requirements drove you to this decision.

Looking for Scala and Java Experts?

Contact us!

We will make technology work for your business. See the projects we have successfully delivered.

--

--

Java consultant having experience with the Kafka ecosystem, Cassandra as well as GCP and AWS cloud providers. https://pl.linkedin.com/in/jaroslawkijanowski