Are you sure your AnyVals don’t instantiate?

Mikołaj Koziarkiewicz
SoftwareMill Tech Blog
9 min readOct 17, 2018

--

Photo by Matthew Henry from Burst

Say you’ve got a domain entity modeled by something like this:

this is mostly OK, but it lacks a bit of type precision — we’d like To Reason About™ the various field types in a more granular fashion.

There are currently two major schools on enriching the types appropriately. One is using tags:

Another, gaining more popularity recently, is using value classes to encapsulate and encode the various field entities.

You can find a good introduction here. Following a value-class-based encoding, our example is represented by:

Theoretically, value classes allow type isolation without actual instantiation. In fact, they’ve been gaining a bit of ground recently; you can see them used as e.g. the canonical encoding for octopus.

So, let’s talk theory — and practice. But first, a quick recap.

Value classes — why?

Value classes are classes with exactly one value attribute. Equivalently:

and:

They have the following two — closely related — advantages:

  • barring some exceptions, value classes are automatically inlined — as their singular value, with the appropriate type;
  • due to the above, any method that’s on the value class is converted into a static method called on the singular value — again, with some exceptions.

Other than the annoying unwrapping of the singular field, they seem perfect for our purpose.

Value classes — in practice

So, in the context of instantiation, what are the exceptions? Quoting the documentation:

A value class is actually instantiated when:

1. a value class is treated as another type.

2. a value class is assigned to an array.

3. doing runtime type tests, such as pattern matching.

At first glance, that doesn’t seem so bad.

In the real world however, it is deceptively easy to fall into the trap of instantiation.

A good — and slightly frightening — overview of various instantiation cases is presented in the first section of this Stephen Campbell’s blog entry.

However, I want to focus on something way more critical to software development in Scala. That’s right — it’s JSON deserialization.

Setting up

We’re going to be using Scala 2.12 and Oracle’s JDK for Java 8.

Let’s start with a typical setup for an Akka HTTP project. Here’s the boilerplate.

As you can see we’re still missing the domain itself, which we’re keeping as simple as possible:

And, finally, the encoding. We’re using circe in this case, which provides a handy-dandy semi-auto decoder for value classes.

For the actual Message, the decoder will be fully manual, to ensure that any problems can be fully isolated to the decoding of Text itself.

Here’s everything we need:

Let’s try running Main, connecting jvisualvm, and run a couple of POST requests (using HTTPie):

That doesn’t look good:

Clearly, Text is being instantiated.

Isolating the problem

Let’s cut out the middleman and just try out decoding without any sort of HTTP connectivity:

Let’s try it, and…​

domain.Message      16	16 (0.0%)	1 (0.0%)
domain.Message$Text 16 16 (0.0%) 1 (0.0%)

Yup, same thing.

Looks bad, but may still be some sort of fluke, or a quirk of how jvisualvm operates. We can eliminate that by dwelling deeper into the internals, starting with running the async-profiler, now newly integrated into IDEA in the EAP version. This way, we’ll be able to examine the relevant callpoints.

While there aren’t any outright instantiations visible (<clinit> calls), we can indeed see that the tDecoder is invoked, along with several macro methods.

This is, of course, expected. The decoding methods are there to be used after all. So, that avenue of investigation is a bust — apart from knowing that “our” decoders are indeed invoked.

Let’s turn to the bytecode of Transcoding (which is quite substantial, so I’ll spare you the overview).

When searching for references to Text, immediately, we find this:

This is a synthetic method accepting an Object (1), casting it into Text (2), and retrieving the value of the field (3) in order to initialize Message (4).

The method, in turn, is called here:

The previous method is being invoked at (1). Note one of the final lines, defining the local variable c. We’re now looking at the bytecode of the anonymous function in Decoder.instance for Message , i.e.:

Apparently something here is coercing instantiation.

Attempting a workaround

NOTE: the following section engages in heavily exploratory analysis and is non-essential for the understanding of the gist of this post — feel free to skip to “It gets worse!” if strapped for time. Otherwise, read on.

Tweaking the decoder

On a whim, let’s try to omit the unwrapping decoder for Text:

…​and:

no Text! OK, so it’s possible this has to do with the decoder for Text? Let’s test it by itself:

domain.Message$Text	16	16 (0.0%)	1 (0.0%)

What about just invoking the decoder?

domain.Message$Text	16	16 (0.0%)	1 (0.0%)

Same thing!

Either will not save you

As a sanity check, let’s simulate our deserialization directly, without an encoder:

domain.Message$Text	16	16 (0.0%)	1 (0.0%)

Evidently even creating an Either will cause text to instantiate! We can see it in the bytecode here:

The telltale NEW/DUP/load-arg/INVOKESPECIAL constructor sequence appears at the very end.

…​and neither will the decoder

It’s also possible that the Decoder itself can coerce instantiation.

Here’s the relevant verification code:

The above may seem intimidating but it’s really not that complex:

  1. The decoder works by processing case classes through a shapeless Generic.
    * A Generic is something that converts to and from a type (parameter A, in our case Text), and its HList representation (the Lazy is there to prevent implicit resolution errors in some scenarios, and has otherwise no bearing on how the decoder is generated).
    * An Hlist is a typed list representing (simplified) the fields of the type. In our case, we just have one field, which means the representation is String :: HNil (HNil being the same as Nil for "normal" lists).
    * in our case, A is Text and R is String.
  2. if the required Generic is found (provided both for case classes and value classes out of the box by shapeless)…​
  3. the “raw” value of the relevant JSON (in our case, the String) is parsed in, used to create an HList of unwrapped :: HNil…​
  4. …​and converted via Generic to the target type, i.e. our Text in this case.

A possible emulation of this process would be:

domain.Message$Text	16	16 (0.0%)	1 (0.0%)

Yup, still on our way to minimal problem reduction.

If you’re getting confused by now (“do AnyVals actually work?”), rest assured that e.g. the following:

import domain.Message.Textobject MainSanityCheckForRaw extends App {  val text = Text("blah")  System.in.read()
println(text.value)
}

does not produce any instances of Text. Everything gets translated into static calls.

More workarounds?

“OK” — you now say — “you’ve manually created a decoder for Message itself”:

“What happens if you go in the other direction, and replace it with a semi-auto one?”

Running MainDecode again:

domain.Message          16	16 (0.0%)	1 (0.0%)
domain.Message$Text 16 16 (0.0%) 1 (0.0%)

Same story, probably for the same reasons.

It gets worse!

By now, you may have noticed that the constant element is not any API that we’re using, but something related to the value classes being “packed” into a parametrized type.

We need to confirm the hypothesis by avoiding circe, shapeless etc. completely.

Implicit resolution

First, we’ll create a custom implicit resolution hierarchy. Let’s say that we have a type class for calculating the length of the string representation of a given class, called StringMeasure:

We can now see what happens if we implement an instance for Text, resolving the "base" case via implicits:

You’ll probably not be very surprised that we get:

domain.Message$Text	16	16 (0.0%)	1 (0.0%)

So, it turns out that any kind of parametrized implicit resolution, even when it targets only the value class in question, causes instantiation.

Simply put, anything that uses implicit resolution to automatically derive generic functionality for its domain will cause value classes to be instantiated.

But do we really need implicit resolution to trigger our now-favorite edge condition?

Culprit — type parameters

As a final example, let’s see what happens when we pass our value class through a simple, parametrized method:

domain.Message$Text	16	16 (0.0%)	1 (0.0%)

And, to be clear, if we switch getThing to def getThing(text: Text): Text = text, the value class doesn’t instantiate.

Bonus — does specialization help?

MainSanityCheckForDef$Number	16	16 (0.0%)	1 (0.0%)

Nope. So no hope for value classes containing primitives.

Finally — the “WHY”

So we see now that the problem is indeed parametrization. It causes instantiation in implicit resolution of circe, in creating a shapeless Generic, etc. etc. etc.

But why?

Well, unfortunately the exact implementation of value classes doesn’t appear to be a subject of the Scala Language Specification (only some minimal conditions are mentioned). So, again, we turn to the documentation referenced at the start. As a reminder:

A value class is actually instantiated when:

1. a value class is treated as another type.

2. a value class is assigned to an array.

3. doing runtime type tests, such as pattern matching.

The most likely condition here appears to be the first — the type parameter counts as “another type” (there’s additional proof of this in some specs for Dotty; we’ll get to that at the conclusion of the post).

Unfortunately, the SIP fails to provide relevant examples, so I’m not 100% certain what actually triggers the conditions.

My guess boils down to the following: due to type erasure on the JVM, when generating bytecode for type parameters, you have two options:

  • either you create polymorphic duplicates of all relevant methods for your types (as in specialization),
  • or you must pass everything as java.lang.Object.

Since we’re not doing the former, the latter must happen — so our value class therefore gets instantiated.

What now?

You may be wondering if it’s time to panic now. And the answer is, of course — not really.

If you’re running this kind of domain encoding and not seeing any performance problems, you should be absolutely fine for now.

Instantiation is quite cheap on modern VMs, the biggest problem being potentially longer GC pauses on traffic spikes.

Doublechecking

Of course, if you have potential performance bottlenecks, it would pay off to check your code. Unfortunately, I can’t offer you anything other than old-fashioned, manual heap analysis via jvisualvm or other tools.

Theoretically, it would be possible to create a plugin that checks whether value classes are instantiated during tests, probably using sbt-jmh or similar as a base. Apart from actually coaxing JMH or another benchmark/profiler to provide the right information, you probably would also need to devise some heuristic to detect value classes via structure, as they appear to be identical to normal classes otherwise - the only difference are the caller patterns and additional static methods.

Reducing noise

For existing projects — especially if your value classes have little or no methods — consider removing the AnyVal qualifier, and check whether there’s any performance hit.

After all, you’re not fooling the compiler. Why fool yourself? Or that poor maintainer that inherits the codebase after you (and may be you)?

For new projects

If you are, however, creating a new project, consider that, again:

Anything that uses parametrized resolution, like:

  • automatic generation of JSON encoders/decoders,
  • any sort of generic converters,
  • indeed anything that generates ADT hierarchies, with implicits or otherwise,

will instantiate your value classes (barring some very specific compiler optimization like e.g. here).

This is something to write in the column of value class viability for domain modelling.

Again, the remark of not actually using the AnyVal qualifier in the previous section? That also applies whenever you’re using them solely as DTOs/domain objects (with no member methods etc.). You don’t want to foster cargo-cult programming, do you?

As I wrote at the very start, value classes comprise only one of the “slightly-better-typed” current standards for modelling, the other being tagged types. These are also not ideal (e.g. if the tag implementation is covariant, you can get away with inserting raw types in the relevant fields), but they also have their advantages. A good primer on using tagged types was written by Marcin Rzeźnicki and can be found on the Iterators blog here.

Dotty FTW

Finally, to end on a brighter note, Dotty is considering the introduction of Opaque Types. This appears to be intended as a replacement of value classes.

One of the stated main goals of opaque types is indeed avoiding the instantiation issues that plague value classes. In fact, this section of the SIP enumerates virtually identical instantiation cases to the ones we discussed here!

The SIP is still pending, so let’s keep our fingers crossed that it will make it to Scala 3 or later. In the meantime, leaving you with new information on domain class modelling in Scala provided in this post, happy coding!

Looking for Scala and Java Experts?

Contact us!

We will make technology work for your business. See the projects we have successfully delivered.

--

--