Cats Concurrency basics with Ref and Deferred
Concurrent access and referential transparency
Ref and Deferred are the basic building blocks of structures in FP used in concurrent fashion. Especially when used with tagless final abstraction those two, when building the business logic, can get us both: concurrent access and referential transparency and we can use them to construct more advanced structures like counters and state machines.
Before we dive deep into Ref and Deferred it is useful to know that concurrency in Cats is based around Java AtomicReference
and we will start our journey here.
Atomic Reference
AtomicReference
is one of the elements from java.util.concurrent.atomic
package and in Oracle docs we can read that java.util.concurrent.atomic
is:
A small toolkit of classes that support lock-free thread-safe programming on single variables. In essence, the classes in this package extend the notion of
volatile
values, fields, and array elements to those that also provide an atomic conditional update operation..…
Instances of classes
AtomicBoolean
,AtomicInteger
,AtomicLong
, andAtomicReference
each provide access and updates to a single variable of the corresponding type.
AtomicReference
is with us since Java 1.5 and it’s used to get better performance than synchronisation (which usually but not always is the case).
When you have to share some data between threads you have to protect access to that piece of data. The simplest example would be to increment some int: i = i + 1
. Our example consist of actually 3 operations, first we would read the value of i
then we add 1
to that value, and at the end we assign newly calculated value to the i
again. With multithreaded applications we can have a situation where each thread will execute those 3 steps between other thread’s steps and the final value of i
will not be possible to predict.
Normally the word synchronised
or lock class mechanism would appear in your head, but with atomic.*
you no longer need to worry about explicit synchronisation, and you can relay on provided utility atomic types where the check if the operation is done in one step is automatically included.
Let’s take the AtomicInteger.incrementAndGet
as an example:
With compareAndSet
operation we either update our data or fail but we never make the thread wait. This way if the compareAndSet
in incrementAndGet
fails we just try to redo the whole operation again, fetching the current value of our data with get()
at the beginning. On the other hand with synchronized
-like mechanisms there is no limit on how many statements you want to execute while the lock is acquired but that block will never fail and may make the calling thread wait — providing the possibility to deadlock or decreasing performance.
Now, with the basics in our head let’s move on to our first concurrency mega-star.
Ref
Cats Ref
is very similar to the above mentioned Java atomic reference, the main differences are that Ref
is used with tagless final abstraction F
, it always contains a value and the value contained in Ref
— of A
type is always immutable.
Ref[F[_], A]
is purely functional mutable reference:
- concurrent
- lock free
- and always contains a value
It is created by giving an initial value and every operation on is wrapped in F
eg. cats.effect.IO
.
If we look closely into companion object for Cats Ref
we can see that our F
needs to fulfil a requirement of being a Sync
.
The above method is just an example of many operations available on our Ref
, it is used to construct Ref
with an initial value. Sync
gives us the ability to suspend any side effects with its delay
method for every operation on Ref
.
Ref
is a pretty simple construct, we can focus mainly on its get
, set
and of
methods to understand how it works.
get
and set
approach
Let’s say that we have an object (for the purposes of this blog post we will name it Shared
) which needs to be updated by multiple threads and we use our get
and set
methods to do that, constructing utility method to help us on the way:
Our Shared
object can be constructed by applying its previous state and new value to build a new instance — the Shared
can be actually anything we want — a simple List, Map or whatever we want to access concurrently in a safe manner. I have just invented Shared(prev: Shared, msg: String)
for the purposes of this tutorial.
F
in our example above was replaced by concrete IO
from Cats effect but bear in mind that Ref
is polymorphic in F
and can be used with other libraries.
With our monadic IO
we flatMap over each step and set the value stored in our Ref
to a desired value — or… wait, maybe we don’t.
With this approach, when modifyShared
will be called concurrently we can lose updates! This is because, we can have a situation when eg. two threads can read the value with get
and each one of them will execute set
concurrently. Methods get
and set
are not called atomically together.
Atomic update
Of course we can improve on the above example and use other available methods from Ref
. To execute get
and set
together we can use update
.
This will solve our issue with updating the value, but simple update
has its drawbacks. If we want to read the value just right after updating it, in a similar fashion we were using get
and set
, we can end up with stale data being read, let’s say our Ref
will hold a reference to a simple Int
:
modify
to the rescue
We can improve slightly on that by using modify
which will do the same as update, but will return the updated value back to us for later use.
As you can see, this is almost exactly the same implementation as in AtomicInteger.incrementAndGet
example I have shown at the beginning, only in Scala. You can clearly see that Ref
is based on AtomicReference
to do its work, too.
Ref limitations
You have probably already noticed that in case if updating the value fails the function passed to update
/ modify
needs to be run nondeterministically and maybe run multiple times. The good news is though, that this solution turns out in general to be much faster than standard locking and synchronisation mechanism and it’s much safer as this solution cannot deadlock.
Once we know how simple Ref
works we can move onto another Cats Concurrent class: Deferred
.
Deferred
In contrast to Ref
, Deferred
is:
- created empty
- can be completed once
- and once set it cannot be modified or become empty again.
Those properties make the Deferred
simple and pretty interesting at the same time.
Deferred
is for purely functional synchronisation, when we call get
on empty Deferred
we block until the value is available. As per documentation from the class itself the blocking:
blocking mentioned is semantic only, no actual threads are blocked by the implementation
The same call of get
on non-empty Deferred
will return immediately the value stored.
The other method - complete
— will fill up the value if the instance is empty and when called on non-empty Deferred
it will return failure (failed IO in case of IO).
The important thing to note here is that Deferred requires F
to be Concurrent
which means that it can be cancellable.
Good example of using Deferred
is when one part of your application needs to wait for another. Example below is taken from the great talk by Fabio Labella given at Scala Italy 2019 — Composable Concurrency with Ref + Deferred available at Vimeo
In the above example we have a producer and consumer and we want the producer to wait for the consumer setup to finish before writing the messages, otherwise whatever we would write in producer will be lost. To overcome this problem we can use shared Deferred
instance and block on get
until the done
Deferred
instance will be filled up with value on the consumer side (the value in this case is simple Unit
()
).
Of course the above solution is not without its issues, when the consumer setup never terminates we are stuck on waiting and the producer cannot send any messages. To overcome this, we can timeout on get
as well as use Either[Throwable, Unit]
or some other construct instead of simple Unit
inside our Deferred
object.
Deferred
is pretty simple but with conjunction with Ref
it can be used to build more complex data structures like eg. semaphores.
For more information I encourage you to visit the Cats documentation itself where you can find out more about Cats concurrency and the data structures it provides.