GraphQL DataLoader in Spring Boot — singleton or request scoped?

Jaroslaw Kijanowski
SoftwareMill Tech Blog
3 min readOct 15, 2019

--

Photo by Will Francis on Unsplash

The DataLoader is a very handy pattern to solve the N+1 problem, which arises when a query result contains a field that has to be queried N times. Like when you ask for an animal and the response contains 4 (or N) ids of countries, where it lives. Now to get the names of these countries, you have to make 4 (N) additional queries:

Although this could be solved with a batched baking API we’re back to square one if the result contains a list of elements where for each element an additional call for a requested field is needed. Going back to the example, we ask not for just one animal, but a list (1 call). For each animal, we’ll end up with another query for the countries (N calls).

I’ve explained that problem and its solution in a previous post.

Now a very important thing to consider is the scope of the DataLoader. We have at least two options. We can have a request-scoped loader or a global one, also known as a singleton.

Request-scoped DataLoaderRegistry

When the registry is request-scoped, the results will be batched and cached per request, which typically is what we want. This means, that if we query for animals and get two of them in response and we use the data loader pattern to query for various countries, then:

  • there will be only one query sent to the country service,
  • a unique set of parameters will be passed.

In practice each request will create a new DataLoaderRegistry:

It has to be configured with the countries batch loader and put into a context, to make it available in the animalCountriesFetcher() in the GraphQLDataFetchers class. Let’s try it out by running the whole application with the request profile:

The same query run again results in the Country and Animal services being called repeatedly.

Application-scoped DataLoaderRegistry

In contrast to a request-scoped registry the singleton version is created once, in the GraphQLProvider class:

It is configured with the countries batch loader exactly the same way as in the request-scoped version. Having such a singleton we can inject it into a GraphQLInvocation class:

The cache is shared now across all web requests:

Although the animals query is run several times, the countries service is called only once, since the required countries have been fetched previously and are stored in the cache.

When application-scoped and when request-scoped?

Having seen it in action, you can imagine the answer — it depends. If our system has to return different results for the same query based on the user requesting them, then we need a request-scoped registry. If, however, our application returns timetables for buses then it doesn’t matter who is requesting the data. A query for a particular line at a given bus stop should always return the same schedule. Until the schedule changes.

The default cache implementation is using an in-memory HashMap without any expiry mechanism. Once fetched the data stays there until the server is restarted or runs out of memory.

Alternative Cache implementation

To overcome the issues mentioned above we can use a cache implementation which supports eviction based on time and size. A good candidate is Guava’s cache. The data loader can be configured to use any cache implementation which is wrapped by org.dataloader.CacheMap. A custom cache may look like this:

It has a maximum size and an expiry date. These are configuration parameters passed from within the registry defining method:

The configuration is injected from values in the application.yml file:

Let’s make sure it works as expected:

The first query required the loader to fetch the data, but the subsequent query was served from the cache. Finally the third query was run after 15 seconds — the configured interval for the time-based eviction. Since it was empty, the loader had to re-fetch the data.

Summary

When using the data loader pattern we need to decide if the registry is request-scoped or globally available. In case of a global one, we have to carefully think about the caching strategy. The default one may not be sufficient or even lead to OutOfMemoryErrors.

The complete project is available on GitHub.

--

--

Java consultant having experience with the Kafka ecosystem, Cassandra as well as GCP and AWS cloud providers. https://pl.linkedin.com/in/jaroslawkijanowski