Progressively Improving N+1 Queries in GraphQL
In this post, we’ll look at the N+1 query problem that often arises in GraphQL and explore how to address it using DataLoader. Rather than diving too deeply into how to implement DataLoader step-by-step, we’ll focus on why you should consider using it. For reference, the final code discussed here is available on my GitHub.
The N+1 Problem
Let’s start with a well-known scenario: imagine we have a Post
and Comment
model with a one-to-many relationship. The GraphQL schema might look like this:
type Comment {
id: Int!
content: String!
}
type Post {
id: Int!
title: String!
content: String!
comments: [Comment!]!
}
type Query {
posts: [Post!]!
}
const postResolver: Resolvers = {
Post: {
comments: async (parent, __, context) => {
return await context.prisma.comment.findMany({
where: { postId: parent.id },
})
},
},
Query: {
posts: async (_, __, context) => {
const posts = await context.prisma.post.findMany({
take: 20,
})
return posts
},
},
}
When we implement a resolver that can fetch posts
, without requesting comments
, there’s usually no issue. The posts
field in the query might result in just a single SQL query to the database. In this situation, there’s no performance problem—although you might still be over-fetching columns you don’t need, that’s a separate concern from the N+1 issue.
query ExampleQuery {
posts {
id
comments {
id
}
}
}
But what happens if the client’s query also includes the comments
field? As GraphQL resolves fields in a nested manner, once posts
are resolved, it will then proceed to resolve the comments
for each individual Post
. If you have N posts, that could mean N additional queries for the comments, resulting in a total of N+1 queries. This leads to significant network overhead and can quickly overload the database under heavier traffic.
Even if you don’t feel the performance hit in early testing or low-traffic scenarios, this N+1 pattern becomes a serious bottleneck as the application scales. It’s also hard to troubleshoot because each comment request is processed independently by the database, which might lead to inefficient query planning on the DB side.
Using Prefetch to Avoid N+1
const postResolver: Resolvers = {
Query: {
posts: async (_, __, context) => {
const posts = await context.prisma.post.findMany({
take: 20,
include: { comments: true },
})
return posts
},
},
}
A common solution is to eagerly load related data upfront. For example, you can modify your posts
resolver to also fetch all related comments in a single batch query. With this approach, you’ll typically see only two queries in your logs:
- One query to fetch multiple
Post
records. - One query that uses
WHERE IN
operation to fetch the associatedComment
records for those posts.
This solves the N+1 problem, but it isn’t perfect. If the client doesn’t request comments
, you’re still fetching them anyway, resulting in unnecessary over-fetching.
Leveraging the info
Object
Wouldn’t it be great if we could decide at runtime whether to fetch comments based on whether the client requested them? GraphQL’s info
object, available in every resolver, contains all sorts of execution details, including which fields the client has requested.
import graphqlFields from 'graphql-fields'
const postResolver: Resolvers = {
Query: {
posts: async (_, __, context, info) => {
const requestedFields = graphqlFields(info)
const hasRequestedCommentsField =
Object.keys(requestedFields).includes('comments')
const posts = await context.prisma.post.findMany({
take: 20,
...(hasRequestedCommentsField && { include: { comments: true } }),
})
return posts
},
},
}
By using a library like graphql-fields
, we can parse the info
object and check if the client requested comments
. Based on that, we can conditionally add comments
to the query or skip it.
query ExampleQuery {
posts {
id
}
}
# SELECT `public`.`Post`.`id`, `public`.`Post`.`title`, `public`.`Post`.`content`, `public`.`Post`.`createdAt`, `public`.`Post`.`userId` FROM `public`.`Post` WHERE 1=1 ORDER BY `public`.`Post`.`id` ASC LIMIT ? OFFSET ? [20,0]
query ExampleQuery {
posts {
id
comments {
id
}
}
}
# SELECT `public`.`Post`.`id`, `public`.`Post`.`title`, `public`.`Post`.`content`, `public`.`Post`.`createdAt`, `public`.`Post`.`userId` FROM `public`.`Post` WHERE 1=1 ORDER BY `public`.`Post`.`id` ASC LIMIT ? OFFSET ? [20,0]
# SELECT `public`.`Comment`.`id`, `public`.`Comment`.`content`, `public`.`Comment`.`userId`, `public`.`Comment`.`postId`, `public`.`Comment`.`createdAt` FROM `public`.`Comment` WHERE `public`.`Comment`.`postId` IN (?,?,?,?,?,?,?,?,?,?) [1,2,3,4,5,6,7,8,9,10]
This approach means that if comments
weren’t requested, no extra query runs. It’s a definite improvement. For APIs that don’t have very deep relationships, this might be sufficient. However, as relationships become more complex, maintaining these conditional loads via info
can get cumbersome. Also, if we need the same resource in different resolvers, relying on info
quickly becomes unmanageable.
Introducing DataLoader
type Comment {
id: Int!
content: String!
}
type Post {
id: Int!
title: String!
content: String!
comments: [Comment!]!
}
type Query {
latestPosts: [Post!]!
bestPosts: [Post!]!
}
Now let’s consider a slightly more complex scenario. Suppose we split our queries to fetch Post
resources into two separate resolvers—one that fetches recent posts and another that fetches popular posts.
const postResolver: Resolvers = {
Query: {
latestPosts: async (_, __, context, info) => {
const requestedFields = graphqlFields(info)
const hasRequestedCommentsField =
Object.keys(requestedFields).includes('comments')
const posts = await context.prisma.post.findMany({
take: 20,
orderBy: { createdAt: 'desc' },
...(hasRequestedCommentsField && { include: { comments: true } }),
})
return posts
},
bestPosts: async (_, __, context, info) => {
const requestedFields = graphqlFields(info)
const hasRequestedCommentsField =
Object.keys(requestedFields).includes('comments')
const posts = await context.prisma.post.findMany({
take: 3,
orderBy: { views: 'desc' },
...(hasRequestedCommentsField && { include: { comments: true } }),
})
return posts
},
},
}
Without DataLoader, we might end up with duplicate queries, for example:
- One query to fetch the 20 latest
Post
s. - One query to fetch their associated
Comment
s. - One query to fetch the top 3
Post
s. - Another query to fetch their
Comment
s.
In this setup, we’re running two separate IN
queries for comments—even though we could theoretically batch them into a single query. This is a common scenario in GraphQL: the client can ask for multiple different sets of resources in one go, and it’s up to us to serve all of them efficiently with minimal overhead.
Using DataLoader to Batch Queries
DataLoader is a small yet powerful library designed to solve these batching issues by leveraging the event loop. It’s not GraphQL-specific, but it’s often used alongside GraphQL servers and ORMs. The source code for DataLoader is actually quite short (around 400 lines, including comments). Its main idea is that calls to load()
keys are batched and resolved together in a single function call, reducing the number of queries executed.
When creating a new DataLoader
instance, you must define a “batch function” that receives an array of keys. This function is responsible for returning the data that corresponds to those keys. For example, if you have a list of Post
IDs, the batch function fetches all the corresponding Comment
s in one go and then maps them back to their respective Post
s in the correct order.
const postByIdLoader = new DataLoader(batchPostById)
postByIdLoader.load(3)
postByIdLoader.load(6)
postByIdLoader.load(9)
function batchPostById(keys: number[]) {
console.log(keys) // [3, 6, 9]
}
const postByIdLoader2 = new DataLoader(batchPostById2)
postByIdLoader.load('a')
postByIdLoader.load('b')
postByIdLoader.load('c')
function batchPostById2(keys: string[]) {
console.log(keys) // ['a', 'b', 'c']
}
Let’s revisit the first example scenario. Instead of resolving comments for each post individually, we’ll create a DataLoader instance with a batch function that takes an array of post IDs, queries all matching comments at once, and then distributes the results back to each post.
const postResolver: Resolvers = {
Post: {
comments: async (parent, __, context) => {
return await context.prisma.comment.findMany({
where: { postId: parent.id },
})
},
},
Query: {
posts: async (_, __, context) => {
const posts = await context.prisma.post.findMany({
take: 20,
})
return posts
},
},
}
async function batchCommentByPostId(postIds: number[]) {
const comments = await prisma.comment.findMany({
where: { postId: { in: postIds } },
})
return postIds.map((postId) =>
comments.filter((comment) => comment.postId === postId),
)
}
DataLoader recommends creating a new instance per request. This ensures that caching doesn’t leak data between different client requests. In Apollo Server, you can initialize a DataLoader in the context
function so that each incoming request gets its own DataLoader instance. Apollo’s documentation also suggests this approach.
const server = new ApolloServer({
schema: makeExecutableSchema({ typeDefs, resolvers }),
context: ({ req, res }) => {
return {
req,
res,
loaders: { commentByIdLoader: new DataLoader(batchCommentByPostId) },
}
},
})
Once we have DataLoader set up, we can remove our complex logic based on info
and simply call commentByPostIdLoader.load(post.id)
in the comments
resolver. If the user queries 23 posts in total (from various sources), once all comments
resolvers are invoked, these 23 keys are batched together, and a single query fetches all their comments. This eliminates the redundant queries entirely.
const postResolver: Resolvers = {
Post: {
comments: async (parent, _, context) => {
const { loaders } = context
return loaders.commentByPostIdLoader.load(parent.id)
},
},
Query: {
latestPosts: async (_, __, context) => {
const posts = await context.prisma.post.findMany({
take: 20,
orderBy: { createdAt: 'desc' },
})
return posts
},
bestPosts: async (_, __, context) => {
const posts = await context.prisma.post.findMany({
take: 3,
orderBy: { views: 'desc' },
})
return posts
},
},
}
What Happens Inside DataLoader?
query ExampleQuery {
latestPosts {
id
comments {
id
}
}
bestPosts {
id
comments {
id
}
}
}
# SELECT `public`.`Post`.`id`, `public`.`Post`.`title`, `public`.`Post`.`content`, `public`.`Post`.`createdAt`, `public`.`Post`.`authorId`, `public`.`Post`.`views` FROM `public`.`Post` WHERE 1=1 ORDER BY `public`.`Post`.`createdAt` DESC LIMIT ? OFFSET ? [20,0]
# SELECT `public`.`Comment`.`id`, `public`.`Comment`.`content`, `public`.`Comment`.`userId`, `public`.`Comment`.`postId`, `public`.`Comment`.`createdAt` FROM `public`.`Comment` WHERE `public`.`Comment`.`postId` IN (?,?,?,?,?,?,?,?,?,?) [6,7,10,3,5,8,9,1,2,4]
# SELECT `public`.`Post`.`id`, `public`.`Post`.`title`, `public`.`Post`.`content`, `public`.`Post`.`createdAt`, `public`.`Post`.`authorId`, `public`.`Post`.`views` FROM `public`.`Post` WHERE 1=1 ORDER BY `public`.`Post`.`views` DESC LIMIT ? OFFSET ? [3,0]
For those who are curious, here’s a quick look at what happens under the hood:
- When you first call
load()
, DataLoader creates an internal_batch
object and schedules a batch dispatch when the event loop is free. - Each subsequent
load(key)
call adds that key to the_batch
object. - When the call stack clears, DataLoader executes its queued batch function (the one you defined when creating the instance) with all the accumulated keys.
- The batch function returns a list of results, and DataLoader resolves all the pending promises simultaneously.
- If the same key is requested multiple times, DataLoader returns the cached result from a previous load, preventing duplicate queries.
Wrapping Up
We’ve explored several ways to address the N+1 problem in GraphQL, starting from straightforward eager-loading strategies to more dynamic approaches using info
, and finally to a robust batching solution with DataLoader.
I was inspired to write this because our team recently needed DataLoader, and explaining how and why it’s useful to my colleagues made me question how well I understood it myself. I hope this article helps others facing similar challenges.