Tapir codecs get an update

Adam Warski
SoftwareMill Tech Blog
7 min readApr 3, 2020

--

Encoding and decoding values is one of the core concepts in a library like Tapir, which is used to describe HTTP endpoints. This involves specifying how the values of query parameters, headers, path, body, etc. map to high-level values, and back. That’s why the concept of a Codec is so central to Tapir.

Tapir by Hanna Fałkowska

The previous design of Tapir’s codecs had some limitations:

  • codecs were not composable; you couldn’t combine two codecs into one
  • mapping a codec and mapping an endpoint input/output was handled by different mechanisms, and the latter was more constrained than the former
  • all codecs were fixed to originate from a small set of predefined “base types”
  • the order of type parameters was confusing, as it was different to what you might expect when working with FunctionX, etc.
  • in some cases, codecs can be automatically derived. It was often confusing why a codec can’t be derived, as the error message wasn’t detailed enough.
  • there were 3 different codec types, to be used in different contexts (Codec, CodecForMany, CodecForOptional)

That’s why we’ve decided to improve the situation and give Codec a lifting — or maybe “rewrite” would be closer to the changes. The new & improved codec implementation is available from Tapir 0.13.0.

The good news is, in most cases, you’ll be using the new codecs exactly the same as the old ones. Code changes should be minimal.

That is, codecs are still looked up as implicit values. There are no syntax changes when it comes to describing endpoints. E.g. query[Int]("limit") still describes a required query parameter which must be parseable as a number, and header[Option[String]](HeaderNames.Authorization) still describes an optional authorization header value.

What might require code changes, however, are cases when you had custom codecs and custom codec formats. And even if changes are required, quite often they’ll be restricted to type signatures.

Mappings

What’s in the new codec, then? Let’s first introduce a new concept, which is even simpler than a codec: a Mapping[L, H]. A mapping is parametrised with two type parameters: the type of the low-level values L, and the type of the high-level value H.

The Mapping[L, H] trait defines a bi-directional mapping between values of type L and values of type H. All mappings in Tapir need to be bi-directional, as all endpoints can be interpreted as clients or servers, and in many cases same input/output descriptions can be used an an input, or as an output.

Low-level values of type L can be decoded to a higher-level value of type H. The decoding can fail; this is represented by a result of type DecodeResult.Failure. Failures might occur due to format errors, wrong arity, exceptions, or validation errors. Validators can be added through the validate method.

High-level values of type H can be encoded as a low-level value of type L.

Hence, a mapping consists of three components:

  • a decoding function, L => DecodeResult[H]
  • an encoding function, H => L
  • a validator Validator[H]

Codecs

A Codec[L, H, CF] is a Mapping[L, H], with additional meta-data: an optional schema and the format of the low-level value.

There are built-in codecs for most common types such as String, Int etc. Hence, we have:

Codec.int: Codec[String, Int, TextPlain]
Codec.offsetTime: Codec[String, OffsetTime, TextPlain]
Codec.byteArray: Codec[Array[Byte], Array[Byte], OctetStream]
Codec.cookieCodec: Codec[String, List[Cookie], TextPlain]

There are also some built-in higher-level combinators, which create more complex codecs from simpler ones; take a look at the sources to see what it takes to implement one:

Codec.listHead[T, U, CF <: CodecFormat](
c: Codec[T, U, CF]): Codec[List[T], U, CF]

Codecs are usually defined as implicit values and resolved implicitly when they are referenced. However, they can also be provided explicitly as needed.

For example, a query[Int]("quantity") specifies an input parameter which corresponds to the quantity query parameter and will be mapped as an Int. There’s an implicit Codec[List[String], Int, TextPlain] value that is referenced by the query method.

In this example, the low-level value is a List[String], as a given query parameter can be absent, have a single or many values. The high-level value is an Int. The codec will verify that there’s a single query parameter with the given name, and parse it as an int. If any of this fails, a failure will be reported.

In a server setting, if the value cannot be parsed as an int, a decoding failure is reported, and the endpoint won’t match the request, or a 400 Bad Request response is returned (depending on configuration).

Codec meta-data: formats

Codecs contain an additional type parameter, which specifies the codec format. Each format corresponds to a media type, which describes the low-level format of the raw value (to which the codec encodes). Some built-in formats include text/plain, application/json and multipart/form-data. Custom formats can be added by creating an implementation of the CodecFormat trait.

Thanks to codecs being parametrised by codec formats, it is possible to have a Codec[String, MyCaseClass, TextPlain] which specifies how to serialise a case class to plain text, and a different Codec[String, MyCaseClass, Json], which specifies how to serialise a case class to json. Both can be implicitly available without implicit resolution conflicts.

Different codec formats can be used in different contexts. When defining a path, query or header parameter, only a codec with the TextPlain media type can be used. However, for bodies, any media types is allowed. For example, the input/output described by jsonBody[T] requires a json codec.

Codec meta-data: schemas

Schemas are unchanged comparing to previous Tapir versions. A schema describes how the high-level value is encoded when sent over the network. This information is used when generating documentation.

The schema is left unchanged when mapping a codec or an input/output, as the underlying representation of the value doesn’t change. However, schemas can be changed for individual inputs/outputs using the .schema(Schema) method.

Tapir family by Zofia Warska

Composing mappings & codecs

Both Mappings and Codecs have a number of .map functions, which allow extending a given codec to support more complex types. For example, we have:

val zonedDateTime: Codec[String, ZonedDateTime, TextPlain] = 
offsetDateTime.map(_.toZonedDateTime)(_.toOffsetDateTime)

Quite importantly, each Codec is also a Mapping! Hence, any logic included in a codec, which decodes/encodes values, can be re-used to enrich another codec or mapping.

The following mapping methods are available:

def map[HH](codec: Mapping[H, HH]): Codec[L, HH, CF] 
def map[HH](f: H => HH)(g: HH => H): Codec[L, HH, CF]
def mapDecode[HH](f: H => DecodeResult[HH])(
g: HH => H): Codec[L, HH, CF]

Mapping inputs & outputs

Mappings can also be used to map endpoint inputs and outputs. Hence the same logic that is used to create codecs, can now be used to customise input/output descriptions.

When adding support for a custom type, you now have a choice. If the type is used multiple times in multiple contexts, you should create a new implicit codec, for example:

case class MyId(...) def decode(s: String): DecodeResult[MyId] = MyId.parse(s) match {
case Success(v) => DecodeResult.Value(v)
case Failure(f) => DecodeResult.Error(s, f)
}
def encode(id: MyId): String = id.toString
implicit val myIdCodec: Codec[String, MyId, TextPlain] =
Codec.string.mapDecode(decode)(encode)
// When describing an endpoint, we can now use:
query[MyId]("my_id")

Or, if the custom type is used only once, you can map a query input description which corresponds to an existing type:

query[String]("my_id").map(decode)(encode)// or if the codec/mapping is defined (doesn't have to be implicit)
query[String]("my_id").map(myIdCodec)

Note that when mapping inputs/outputs, the additional meta-data (schema & format) of a Codec isn’t taken into account. If you’d like to customise these on a per-endpoint basis, you’ll have to use the .schema method.

Deriving codecs

For complex types (the dominant use-case are case classes), codecs can often be automatically derived. The sttp.tapir.jsonBody[T] method has been moved from the main package to json-implementation packages, such as sttp.tapir.json.circe.jsonBody[T] (similarly for json4s/spray/upickle).

Thanks to that, the jsonBody method can directly require all the other implicit components that are needed to create a codec. Hence, when before users got an opaque error saying that an implicit codec instance can’t be found, now the error will be more precise, e.g.:

[error] BooksExample.scala:31:17: could not find implicit value for evidence parameter of type io.circe.Encoder[BooksExample.Book]
[error] jsonBody[Book]
[error] ^

Migration

In most cases, if you do get compilation errors with Tapir 0.13, you’ll need to change Codec[H, CF, R] into:

Codec[R, H, CF]

The main difference is the parameter ordering: it follows how you type functions, with additional information (the format) at the end.

If you’ve been explicitly using CodecForMany or CodecForOptional, replace these with Codec[List[_], _, _] or Codec[Option[_], _, _].

Try it out!

New Tapir codecs & mappings are:

  • composable: codecs & mappings can be chained using .map
  • universal: can be used both for mapping codecs and inputs/outputs
  • flexible: can be defined for any two types
  • intuitive: the type parameters follow the intuition of the decoding function
  • regular: same type is used for multiple, single and optional contexts

Give the new Tapir codecs a try! If you encounter any issues, please report them on GitHub, or ask on gitter. Likewise, if you have any suggestions on improvements, please let us know!

--

--

Software engineer, Functional Programming and Scala enthusiast, SoftwareMill co-founder