Event sourcing using Kafka

Published in

SoftwareMill Tech Blog

10 min readMar 5, 2018

Kawka, pronounced /ˈkaf.ka/ in polish — Western jackdaw

When building an event sourced system, there’s a couple of options available when it comes to persistence. First is EventStore, a mature, battle-proven implementation. Alternatively, you can use akka-persistence to leverage the scalability of Cassandra together with the performance of the actor model. Yet another possibility is using the good-old relational database, combining the traditional CRUD approach with events and taking advantage of transactions.

In addition to these (and probably many more) possibilities, thanks to a couple of new features introduced lately, it’s now pretty straightforward to do event sourcing on top of Kafka as well. Let’s see how.

What is event sourcing?

There’s a number of great introductory articles, so this is going to be a very brief introduction. With event sourcing, instead of storing the “current” state of the entities that are used in our system, we store a stream of events that relate to these entities. Each event is a fact, it describes a state change that occurred to the entity (past tense!). As we all know, facts are indisputable and immutable.

Having a stream of such events it’s possible to find out what’s the current state of an entity by folding all events relating to that entity; note, however, that it’s not possible the other way round — when storing the “current” state only, we discard a lot of valuable historical information.

Event sourcing can peacefully co-exist with more traditional ways of storing state. A system typically handles a number of entity types (e.g. users, orders, products, …), and it’s quite possible that event sourcing is beneficial for only some of them. It’s important to remember that it’s not an all-or-nothing choice, but an additional possibility when it comes to choosing how state is managed in our application.

Storing events in Kafka

The first problem to solve is how to store events in Kafka? There are three possible strategies:

store all events for all entity types in a single topic (with multiple partitions)
topic-per-entity-type, e.g. separate topics for all user-related events, all product-related events, etc.
topic-per-entity, e.g. a separate topic for each single user and each single product

Apart from low-arity entities, the third strategy (topic-per-entity) is not feasible. If each new user in the system would require the creation of a topic, we would end up with an unbounded number of topics. Any kind of aggregations would also be very hard, such as indexing all users in a search engine, as it would require consuming a large amount of topics, which in addition wouldn’t all be known upfront.

Hence, we can choose between 1. and 2. Both have their pros and cons: with a single topic, it’s easier to get a global view of all events. On the other hand, with topic-per-entity-type, it’s possible to partition and scale each entity type stream separately. The choice between the two depends on the use-case.

It is also possible to have both, at the cost of additional storage: derive entity-type topics from the all-events topic.

In the remainder of the article we’ll assume we’re working with a single entity type and a single topic, however it’s easy to generalise to multiple topics or entity types.

(EDIT: as Chris Hunt noted on twitter, there’s a very good article by Martin Kleppmann dealing in depth with how to assign events to topics and partitions).

Basic event-sourcing storage operations

The most basic operation that we would expect from a storage supporting event sourcing is reading the “current” (folded) state of a particular entity. Typically, each entity has some kind of an id. Hence, given that id, our storage system should return its current state.

The event log is the primary source of truth: the current state can always be derived from the stream of events for a particular entity. In order to do that, the storage engine needs a pure (side-effect free) function, taking the event and current state and returning the modified state: Event => State => State. Given such a function and an initial state value, the current state is a fold over the stream of events. (The state-modification function needs to be pure so that it can be freely applied multiple times to the same events.)

A naive implementation of the “read current state” operation in Kafka would stream all of the events from the topic, filter them to include only the events for the given id and fold them using the given function. If there’s a large number of events (and over time, the number of events only grows), this can be a slow and resource-consuming operation. Even if the result would be cached in-memory in a service node, it would still need to be periodically re-created, for example due to node failures or cache eviction.

Hence, we need a better way. That’s where kafka-streams and state stores come into play. Kafka-streams applications run across a cluster of nodes, which jointly consume some topics. Each node is assigned a number of partitions of the consumed topics, just as with a regular Kafka consumer. However, kafka-streams provides higher-level operations on the data, allowing much easier creation of derivative streams.

One such operation in kafka-streams is folding a stream into a local store. Each local store contains data only from the partitions that are consumed by a given node. There are two local store implementations available out of the box: an in-memory one and a RocksDB-based one.

Coming back to event sourcing, we can fold the stream of events into the state store, keeping locally the “current state” of each entity from the partitions assigned to the node. If we are using the RocksDB implementation of the state store, we are only limited by disk space as to how many entities can be tracked on a single node.

Here’s how folding events into a local store looks like using the Java API (serde stands for serializer/deserializer):

For a full example, check out the orders microservices example by Confluent.

(EDIT: as Sergei Egorov and Nikita Salnikov noticed on Twitter, for an event-sourcing setup you’ll probably want to change the default Kafka retention settings, so that netiher time-based or size-based limits are in effect, and optionally enable compaction.)

Looking up the current state

We have created a state store containing the current states of all entities coming from partitions assigned to the node, but how to query it? If the query is local (same node), then it’s quite straightforward:

But what if we want to query for data which is present on another node? And how do we find out which node it is? Here, another feature recently introduced to Kafka comes in: interactive queries. Using them, it’s possible to query Kafka’s metadata and find out which node processes the topic partition for a given id(this uses the topic partitioner behind the scenes):

Then it’s a matter of forwarding the request to the appropriate node. Note that how the inter-node communication is handled and implemented — is it REST, akka-remote or any other way — is outside the scope of kafka-streams. Kafka just allows accessing the state store, and gives information on which host a state store for a given id is present.

Fail-over

State stores look nice, but what happens if a node fails? Re-creating the local state store for a partition might also be an expensive operation. It can cause increased latencies or failed requests for a long period of time due to kafka-streams re-balancing (after a node is added or removed).

That’s why by default persistent state stores are logged: that is, all changes to the store are additionally written to a changelog-topic. This topic is compacted (we only need the latest entry for each id, without the history of changes, as the history is kept in the events) and hence as small as possible. Thanks to that, re-creating the store on another node can be much faster.

But that still might cause latencies on re-balancing. To reduce them even further, kafka-streams has an option to keep a number of standby replicas (num.standby.replicas) for each storage. These replicas apply all the updates from the changelog topics as they come in, and are ready to start serving as the primary state store for a given partition as soon as the current one fails.

Consistency

Using the default settings, Kafka provides at-least-once delivery. That is, in case of node failures, some messages might be delivered multiple times. It is possible, for example, that an event is applied to a state store twice, if the system failed after the state store changelog was written, but before the offset for that particular event was committed. That might not be a problem: our state-updating function (Event => State => State) might cope well with such situations. But it doesn’t have to; in that case, we can leverage Kafka’s exactly-once guarantees. These exactly-once guarantees only apply when reading and writing Kafka topics, but that’s all that we are doing here: updating the state store’s changelog and committing offsets are all Kafka topic writes behind the scenes, and these can be done transactionally.

Hence, if our state-update function requires that, we can turn on exactly-once stream processing using a single configuration option: processing.guarantee. This causes a performance penalty, but — nothing comes for free.

Listening for events

Now that we have the basics covered — querying and updating the “current state” of each entity — what about running side-effects? At some point, this will be necessary, for example to:

send notification e-mails
index entities in a search engine
call external services via REST (or SOAP, CORBA, etc. 😉 )

All of these tasks are in some way blocking and involve I/O (as is the nature of side-effects), so it’s probably not a good idea to execute them as part of the state-updating logic: that could cause an increased rate of failures in the “main” event loop and create a performance bottleneck.

Moreover, the state-updating logic function (Event => State => State) can be run multiple times (in case of failures or restarts), and most often we want to minimise the number of cases where side-effects for a given event are run multiple times.

Luckily, as we are working with Kafka topics, we have quite a lot of flexibility. The streams stage which updates the state store can emit the events unchanged (or, if needed, modified) and this resulting stream/topic (in Kafka, a topic and stream are equivalent) can be consumed in an arbitrary way. Moreover, it’s possible to consume it either before or after the state-updating stage. Finally, we also have control if we want to run the side-effects at-least-once or at-most-once. At-least-once can be achieved by committing the offset of the consumed event-topic only after the side-effects complete successfully. Conversely, at-most-once, by committing the offsets before running the side-effects.

As to how the side-effects are run, there’s a number of options, depending on the use-case. First of all, we can define a Kafka-streams stage, which runs the side-effects for each event as part of the stream processing function. That’s quite easy to setup, however isn’t a very flexible solution when it comes to retries, offset management and executing side-effects for many events concurrently. In such more advanced cases, it might be more suitable to define the processing using e.g. reactive-kafka or other “direct” Kafka topic consumer.

There’s also a possibility that one event triggers other events — for example an “order” event might trigger “prepare for shipment” and “notify customer” events. This can also be implemented using a kafka-streams stage.

Finally, if we’d like to store the events or some data extracted from the events in a database or search engine, such as ElasticSearch or PostgreSQL, we might use a Kafka Connect connector which will handle all of the topic-consuming details for us.

Creating views and projections

Usually the requirements of a system go beyond querying and handling only individual entity streams. Aggregations, combining a number of event streams, also need to be supported. Such aggregated streams are often called projections and when folded, can be used to create data views. Is it possible to implement this using Kafka?

Again, yes! Remember that at the basic level we are just dealing with a Kafka topic storing our events: and hence, we have all the power of “raw” Kafka consumers/producers, kafka-streams combinator and even KSQL to define the projections. For example, using kafka-streams we can filter the stream, map, group by key, aggregate in time or session windows etc using either code or the SQL-like KSQL.

Such streams can be persistently stored and made available for querying using state stores and interactive queries, just like we did with individual entity streams.

Going further

As the system evolves, to prevent the event stream from growing indefinitely, some form of compaction or storing “current state” snapshots might come handy. That way, we could store only a number of the recent snapshots and the events that occured after them.

While there is no direct support from Kafka for snapshots, as is the case in some of the other event-sourcing systems, it’s definitely possible to add this kind of functionality using some of the already mentioned mechanisms, such as streams, consumers, state stores, etc.

Summing up

While Kafka wasn’t originally designed with event sourcing in mind, it’s design as a data streaming engine with replicated topics, partitioning, state stores and streaming APIs is very flexible. Hence, it’s possible to implement an event sourcing system on top of Kafka without much effort. Moreover, as behind-the-scenes there’s always a Kafka topic, we get the additional flexibility of being able to work either with high-level streaming APIs or low-level consumers.

You might be interested in our developing real-time pipelines ebook, based on experiences from consulting on and implementing Apache Kafka in modern and scalable systems.

Like this post and interested in learning more? Follow us on Medium! Need help with your Cassandra, Kafka or Scala projects? Just contact us here.