Take Spring Boot, GraphQL and gRPC micro-services. Solve the N+1 query issue with DataLoader

Jaroslaw Kijanowski
SoftwareMill Tech Blog
7 min readOct 3, 2019

--

Photo by Mario Rodriguez on Unsplash

A quite common approach to building a micro-services oriented architecture in the Java world is to define boundaries between functionalities and write SpringBoot applications for each of them. Micro-services typically do not communicate between each other in a synchronous fashion, but rather exchange messages, a.k.a. events. We also find a gateway in front of our architecture which, besides implementing edge functions like authentication and authorisation, rate limiting, caching, metrics collection and request logging, primarily is responsible for request routing, API composition and protocol translation.

The request routing and API composition part is what I want to tackle in this post. In one of our projects we decided to move away from exposing REST endpoints towards a more lightweight approach, which is gRPC. For the client though, we expose a GraphQL endpoint to decouple our backend development from the constantly evolving and changing frontend requirements. No more writing of tailored REST services and implementing routing rules on the gateway. At the end it’s not a graphite reactor, what could possibly go wro…

Backend first. Schema first.

We’ll develop two micro-services based on Spring Boot and expose their functionality via gRPC. The grpc-spring-boot-starter from LogNet and protobuf-gradle-plugin are a good fit.

Let’s start with the schema.
Here’s the contract for the Animal service:

And a similar one for the Country service:

The Animal service exposes one method —GetAnimals. It accepts one optional parameter AnimalRequest which holds the animal’s id to be fetched. If no id is given, then all animals are returned. An animal has a name, a color and an array of country ids where it is living.

The Country service also exposes one method —GetCountries. A list of country ids can be passed to retrieve a collection of country names. Why is it a list of ids that can be passed to the CountryRequest? We’ll need that kind of functionality later, when using the DataLoader to fetch countries as a batch instead of one by one. TL;DR This is crucial to solve the N+1 query problem explained later and is also stated in the documentation as:

Batching requires batched backing APIs

Having the contracts we can generate the model and service definition files required by gRPC.
This is where the com.google.protobuf gradle plugin kicks in:

By executing the generateProto task the required gRPC classes are created in the src/main/protoGen directory which is included as a Java srcDir.

Finally we can implement the two services. Instead of grabbing data from a database, we’ll use an in-memory List and Map.

The Animal service dependent on the input parameter either returns one animal or the whole list:

The Country service returns all countries for the given ids:

Let’s start both services:

The Gateway — GraphQL, DataLoader, gRPC and a rainbow puking unicorn

The Gateway is a Spring Boot application we build up on the already met grpc-spring-boot-starter and another starter for GraphQL. The two proto files can be copied over to generate the gRPC related files.

Let’s define the GraphQL schema for the frontend:

We’ll expose an animals query to fetch either an animal by its id or the whole inventory. The Animal type is particularly interesting since it contains a list of countries. When we receive an animal from the backend, it does not provide countries, but only their ids. Now for every animal we have to do another gRPC call to fetch the countries by their ids.

If there would be only one animal, we would have just one additional query for the countries, since we’ve developed our gRPC GetCountries endpoint in a smart way to accept a list of ids. This design decision solves the N+1 query issue, because we do not have to fetch N countries, one by one. We end up with only two queries: one for the animal and one for all the countries. The problem is that there is this If at the beginning of the paragraph.

When we have received several animals, than the data fetcher for the countries field will call the GetCountries endpoint once for every animal. N+1 is back in its entirety. 🙀

We need a mechanism that will aggregate the calls to GetCountries and when triggered, it will issue only one call and distribute the received countries to the particular callers again. On top of it, we demand a cache. It doesn’t make sense to ask for the same country several times, if its id is present in several calls, like for example, when two animals are present in the same country. Finally it should make coffee as well, but since this is just a nice-to-have, we’ll pick DataLoader. Although it cannot meet this coffee requirement, it’s awesome with handling the two previous ones.

Sounds like piece of cake,
but there’s a croco in the lake!

As mentioned, for the GraphQL functionality we’re going to use a spring boot starter project. But there are at least two of them:

graphql-java-spring which documentation is quite alright excluding the part related to embedding the DataLoader, which is simply not there.

graphql-spring-boot which may have worked out of the box, but somehow the documentation was not appealing to me.

Which one do you choose? The one with less stars and activity but more promising docs, like me? If still in doubt, there are some more arguments in the first project’s community chat.

Let’s get our hands dirty with graphql-java-spring and fight the croco!
Since we started with a GraphQL schema we need to generate the model files. The io.github.kobylynskyi.graphql.codegen gradle plugin does this job nicely:

The GraphQL related model files are generated by the graphqlCodegen task and are available under src/main/graphqlGen.

The most important thing we need to implement are data fetchers, some times called resolvers. A data fetcher is bound with a particular query or field of the GraphQL schema and is responsible for fetching data typically by calling other services. The documentation explains this concept very well as well as how results can be mapped to POJOs.

In our case we need to add two fetchers, one for the animals wired with the animals query and another one for the countries bound to the Animal.countries field:

The animals fetcher is a gRPC client calling the GetAnimals method with or without an id. It returns a collection of animals:

The country fetcher is more interesting. It extracts the country ids from the source object, which is an animal. Then instead of naively doing a gRPC call to the Country service, it delegates the loading of the countries to the DataLoader registered with the countries field:

The countryBatchLoader is again a simple gRPC client calling the GetCountries method:

The key thing here is, that when there are several animals and the countries field has to be evaluated multiple times, there will be only one gRPC call to the Country service. And only unique ids will be send over the wire. The DataLoader knows which caller it has to return what countries to.

The last thing to do is to create a custom GraphQLInvocation component, that will use the DataLoader. This has been already introduced by this PR, but no new version of the starter has been released yet. Version 1.0 does not have it and we add it by creating the CustomGraphQLInvocation class:

Each request will create a new DataLoaderRegistry, which typically is what we want. It is scoped to a single request. What this means with regards to the cache and how to implement the registry as a singleton in case we need caching across multiple requests I explain in another post.

Let’s see it in action. Start the Gateway:

Now run this query:

I’m using GraphiQL, an in-browser IDE for exploring GraphQL. It can be embedded into the Gateway through graphiql-spring-boot-starter
and is available at http://localhost:8081/graphiql.

We received the first animal with two countries. The Animal and Country services logs confirm this:

Now let’s query for all the animals:

Although Australia is displayed three times and Fiji and Brazil twice, the Country service has been called once with a unique set of ids:

The same query run again will result in the Country and Animal services being called once more. As already mentioned, this is because every web request creates a fresh DataLoaderRegistry. We could however refactor the CustomGraphQLInvocation class and inject a DataLoaderRegistry singleton bean. Then the cache would work across all web requests. An example is given in this post.

Summary

The DataLoader pattern, or tool, solves the N+1 query problem nicely. It’s also well documented by the Internet with lot of examples, especially for the JavaScript ecosystem. You may find some examples for Java as well, but what I was missing is a whole Spring Boot micro-services example, where this functionality really shines bright.

As already mentioned, I’d also give it a try with the second graphql spring boot starter, since it looks like it works without all the additional stuff. But then again, once a new version of graphql-java-spring is released, the CustomGraphQLInvocation class could be removed.

Last but not least, this sample project is available on GitHub!

--

--

Java consultant having experience with the Kafka ecosystem, Cassandra as well as GCP and AWS cloud providers. https://pl.linkedin.com/in/jaroslawkijanowski