Cats Concurrency basics with Ref and Deferred

Concurrent access and referential transparency

Krzysztof Grajek
SoftwareMill Tech Blog

--

Ray Dumas @ Flickr.com

Ref and Deferred are the basic building blocks of structures in FP used in concurrent fashion. Especially when used with tagless final abstraction those two, when building the business logic, can get us both: concurrent access and referential transparency and we can use them to construct more advanced structures like counters and state machines.

Before we dive deep into Ref and Deferred it is useful to know that concurrency in Cats is based around Java AtomicReference and we will start our journey here.

Atomic Reference

AtomicReference is one of the elements from java.util.concurrent.atomic package and in Oracle docs we can read that java.util.concurrent.atomic is:

A small toolkit of classes that support lock-free thread-safe programming on single variables. In essence, the classes in this package extend the notion of volatile values, fields, and array elements to those that also provide an atomic conditional update operation..

Instances of classes AtomicBoolean, AtomicInteger, AtomicLong, and AtomicReference each provide access and updates to a single variable of the corresponding type.

AtomicReference is with us since Java 1.5 and it’s used to get better performance than synchronisation (which usually but not always is the case).

When you have to share some data between threads you have to protect access to that piece of data. The simplest example would be to increment some int: i = i + 1. Our example consist of actually 3 operations, first we would read the value of i then we add 1 to that value, and at the end we assign newly calculated value to the i again. With multithreaded applications we can have a situation where each thread will execute those 3 steps between other thread’s steps and the final value of i will not be possible to predict.

Normally the word synchronised or lock class mechanism would appear in your head, but with atomic.* you no longer need to worry about explicit synchronisation, and you can relay on provided utility atomic types where the check if the operation is done in one step is automatically included.

Let’s take the AtomicInteger.incrementAndGet as an example:

With compareAndSet operation we either update our data or fail but we never make the thread wait. This way if the compareAndSet in incrementAndGet fails we just try to redo the whole operation again, fetching the current value of our data with get() at the beginning. On the other hand with synchronized-like mechanisms there is no limit on how many statements you want to execute while the lock is acquired but that block will never fail and may make the calling thread wait — providing the possibility to deadlock or decreasing performance.

Now, with the basics in our head let’s move on to our first concurrency mega-star.

Ref

Cats Ref is very similar to the above mentioned Java atomic reference, the main differences are that Ref is used with tagless final abstraction F , it always contains a value and the value contained in Ref — of A type is always immutable.

Ref[F[_], A] is purely functional mutable reference:

  • concurrent
  • lock free
  • and always contains a value

It is created by giving an initial value and every operation on is wrapped in F eg. cats.effect.IO.

If we look closely into companion object for Cats Ref we can see that our F needs to fulfil a requirement of being a Sync.

The above method is just an example of many operations available on our Ref, it is used to construct Ref with an initial value. Sync gives us the ability to suspend any side effects with its delay method for every operation on Ref.

Ref is a pretty simple construct, we can focus mainly on its get, set and of methods to understand how it works.

get and set approach

Let’s say that we have an object (for the purposes of this blog post we will name it Shared) which needs to be updated by multiple threads and we use our get and set methods to do that, constructing utility method to help us on the way:

Our Shared object can be constructed by applying its previous state and new value to build a new instance — the Shared can be actually anything we want — a simple List, Map or whatever we want to access concurrently in a safe manner. I have just invented Shared(prev: Shared, msg: String) for the purposes of this tutorial.

F in our example above was replaced by concrete IO from Cats effect but bear in mind that Ref is polymorphic in F and can be used with other libraries.

With our monadic IO we flatMap over each step and set the value stored in our Ref to a desired value — or… wait, maybe we don’t.

With this approach, when modifyShared will be called concurrently we can lose updates! This is because, we can have a situation when eg. two threads can read the value with get and each one of them will execute set concurrently. Methods get and set are not called atomically together.

Atomic update

Of course we can improve on the above example and use other available methods from Ref. To execute get and set together we can use update.

This will solve our issue with updating the value, but simple update has its drawbacks. If we want to read the value just right after updating it, in a similar fashion we were using get and set, we can end up with stale data being read, let’s say our Ref will hold a reference to a simple Int:

modify to the rescue

We can improve slightly on that by using modify which will do the same as update, but will return the updated value back to us for later use.

As you can see, this is almost exactly the same implementation as in AtomicInteger.incrementAndGet example I have shown at the beginning, only in Scala. You can clearly see that Ref is based on AtomicReference to do its work, too.

Ref limitations

You have probably already noticed that in case if updating the value fails the function passed to update/ modify needs to be run nondeterministically and maybe run multiple times. The good news is though, that this solution turns out in general to be much faster than standard locking and synchronisation mechanism and it’s much safer as this solution cannot deadlock.

Once we know how simple Ref works we can move onto another Cats Concurrent class: Deferred.

Deferred

In contrast to Ref, Deferred is:

  • created empty
  • can be completed once
  • and once set it cannot be modified or become empty again.

Those properties make the Deferred simple and pretty interesting at the same time.

Deferred is for purely functional synchronisation, when we call get on empty Deferred we block until the value is available. As per documentation from the class itself the blocking:

blocking mentioned is semantic only, no actual threads are blocked by the implementation

The same call of get on non-empty Deferred will return immediately the value stored.

The other method - complete — will fill up the value if the instance is empty and when called on non-empty Deferred it will return failure (failed IO in case of IO).

The important thing to note here is that Deferred requires F to be Concurrent which means that it can be cancellable.

Good example of using Deferred is when one part of your application needs to wait for another. Example below is taken from the great talk by Fabio Labella given at Scala Italy 2019 — Composable Concurrency with Ref + Deferred available at Vimeo

In the above example we have a producer and consumer and we want the producer to wait for the consumer setup to finish before writing the messages, otherwise whatever we would write in producer will be lost. To overcome this problem we can use shared Deferred instance and block on get until the done Deferred instance will be filled up with value on the consumer side (the value in this case is simple Unit () ).

Of course the above solution is not without its issues, when the consumer setup never terminates we are stuck on waiting and the producer cannot send any messages. To overcome this, we can timeout on get as well as use Either[Throwable, Unit] or some other construct instead of simple Unit inside our Deferred object.

Deferred is pretty simple but with conjunction with Ref it can be used to build more complex data structures like eg. semaphores.

For more information I encourage you to visit the Cats documentation itself where you can find out more about Cats concurrency and the data structures it provides.

--

--