Efficient Pagination with GraphQL Dataloader ⏳
After a while of use, we started to notice that Dataloader doesn’t have a built-in API for pagination. Then, how do we solve it?
Since its release in 2015, GraphQL, which was developed by Facebook, is predicted to be the most advanced technology that will replace the role of REST API to communicate between client and server. According to the official documentation, GraphQL is a query language for the API and runtime for fulfilling user requests. GraphQL provides a complete and understandable description and documentation of data in the API, giving clients the freedom to only request the data they need, making it easier to develop the API over time, making it a powerful development tool.
Dataloader itself, according to its official documentation, is called “a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching”. Dataloader is also a library developed by Facebook. Although it can be used in other conditions, Dataloader is generally used in conjunction with GraphQL to handle multiple requests to the database (N+1 problem).
After a while of use, we started to notice that Dataloader doesn’t have a built-in API for pagination. This is how we solve this.
Hack Time ✨
ERD
We will build an e-commerce platform where each merchants
will have one
locations
(one-to-one) and will have many products
(one-to-many). The ERD
scheme can be seen as follows.

GraphQL Schema
Here is a pre-designed GraphQL schema (without pagination).
#typeDefs.graphql
type Query{
allMerchants: [Merchant!]!
}
type Merchant {
id: ID!
name: String!
products: [Product!]!
location: Location!
}
type Product{
id: ID!
name: String!
}
type Location {
id: ID!
latitude: Float!
longitude: Float!
}
GraphQL Instance and Dataloader
To initiate a Dataloader instance on every request and also for convenience, I place it in the GraphQL context.
// graphql.ts
import 'graphql-import-node';
import {makeExecutableSchema} from 'graphql-tools';
import {ApolloServer} from 'apollo-server-express';
import Knex from 'knex';
import Dataloader from 'dataloader';
import config from 'config';
import resolvers from './resolvers';
import * as typeDefs from './typeDefs.graphql';
const loaders = () => ({
getLocationsByMerchantId: new Dataloader(async merchantIds => {
const locations = await Knex(config.get('database'))('locations')
.whereIn('merchantId', merchantIds);
return merchantIds.map(merchantId => locations.find(location => location.merchantId === merchantId));
}),
getProductsByMerchantId: new Dataloader(async merchantIds => {
const products = await Knex(config.get('database'))('products')
.whereIn('merchantId', merchantIds);
return merchantIds.map(merchantId => products.filter(product => product.merchantId === merchantId));
})
});
const graphQLServer = new ApolloServer({
schema: makeExecutableSchema({
typeDefs: [typeDefs],
resolvers
}),
context: ({req}) => ({
currentUser: req.currentUser, // handled by middleware
loaders: loaders()
})
});
export default graphQLServer;
GraphQL Resolver
And also, the resolver itself (without pagination).
// resolvers.ts
import Knex from 'knex';
import {IGraphQLSchemaContext, Merchant} from '../../types';
export default {
Query: {
allMerchants() {
return Knex(config.get('database'))('merchants');
},
},
Merchant: {
products({id: merchantId}: Merchant, _: Record<string, any>, {loaders}: IGraphQLSchemaContext) {
return loaders.getProductsByMerchantId.load(merchantId);
},
location({id: merchantId}: Merchant, _: Record<string, any>, {loaders}: IGraphQLSchemaContext) {
return loaders.getLocationsByMerchantId.load(merchantId);
}
}
};
The code snippet above is a typical example of using a Dataloader. We have a
schema relation where the locations
and products
fields return all data
related to merchants
.
The problem arises when a merchant has a lot of products
. Let’s say merchant A
has 1000 products. Returning 1000 products at once is not a good idea. Besides
having problems with memory consumption, the bandwidth to be taken will also be
large.
Pagination
A common technique for handling large data is to use pagination. Generally, to
use pagination, the client must send some supporting input variables such as
limit
, offset
, orderBy
, sortBy
, and search
.
}
enum SortByEnum {
ASC,
DESC
}
input PaginationInput {
limit: Int = 10
offset: Int = 0
orderBy: String = "name"
sortBy: SortByEnum = "ASC"
search: String
}
After defining the pagination input, then, we have to pass the pagination argument to the Dataloader. We can do this together with the argument id of the load method. However, in my opinion, this is “messy” and breaks the principle of using arguments in the Dataloader. So, instead of calling the Dataloader instance, I call a function that has a pagination argument by returning the Dataloader instance.
The next challenge is how we do one query to the database and get all the data
we need. This is the time where UNION
shines. UNION
is great for the use of
intersecting data by returning only unique data. It is effectively reduce
latency and resource overhead. An example of using union on Knex
is as
follows.
// graphql.ts
// ...other codes
enum PaginationSortByEnum {
ASC = 'asc',
DESC = 'desc'
}
interface IPagination {
limit: number
offset: number
orderBy: string
sortBy: PaginationSortByEnum
search: string
}
const loaders = () => {
let productsPagination: IPagination;
const productsByMerchantIdLoader = new Dataloader(async merchantIds => {
const connection = Knex(config.get('database'));
const table = connection('products');
const {limit, offset, orderBy, sortBy, search} = productsPagination;
if(search)
table.where('name', 'LIKE', `%${search}%`)
const products = await connection.union(
merchantIds.map(id =>
table
.where('merchantId', id)
.orderBy(orderBy, sortBy)
.limit(limit)
.offset(offset)
),
true //option for wrap queries
);
return merchantIds.map(merchantId => products.filter(product => product.merchantId === merchantId));
});
return {
getLocationsByMerchantId: new Dataloader(async merchantIds => {
const locations = await Knex(config.get('database'))('locations')
.whereIn('merchantId', merchantIds);
return merchantIds.map(merchantId => locations.find(location => location.merchantId === merchantId));
}),
getProductsByMerchantId(pagination: IPagination){
productsPagination = pagination;
return productsByMerchantIdLoader;
}
};
};
// ...other codes
Then update the resolvers as well.
// resolvers.ts
// ...other codes
export default {
// ...other codes
Merchant: {
products({id: merchantId}: Merchant, {pagination: input}: {pagination: IPagination}, {loaders}: IGraphQLSchemaContext) {
const paginationCollector = {
limit: input?.limit || 10,
offset: input?.offset || 0,
sortBy: input?.sortBy || PaginationSortByEnum.ASC,
orderBy: input?.orderBy || "createdAt",
search: input?.search || ''
}
return loaders.getProductsByMerchantId(paginationCollector).load(merchantId);
},
// ...other codes
}
};
To understand the following snippet is a simulation of a query and the result.
For example we are querying 2 merchant_id
:
merchantIds = ["0713c0b4-a068-4909-b851-a4fcdad4d111", "3ec5d4f6-1459-4fcc-990a-3791bad34843"]
.
>> (SELECT * FROM products WHERE `merchantId` = "0713c0b4-a068-4909-b851-a4fcdad4d111" ORDER BY `name` ASC LIMIT 0,10) UNION (SELECT * FROM products WHERE `merchantId` = "3ec5d4f6-1459-4fcc-990a-3791bad34843" ORDER BY `name` ASC LIMIT 0,10)
<< [{"id":"98bfa6a5-5558-4d21-9c6c-db5d6267acc4","name":"Kecap Bango 400ml","merchantId":"0713c0b4-a068-4909-b851-a4fcdad4d111"}, ..., {"id":"9d44ca99-6f87-41c0-bf70-bc50db3d2707","name":"Apple iPhone 11 Pro","merchantId":"3ec5d4f6-1459-4fcc-990a-3791bad34843"}, ...]
All data we need is returned by the database, the only task is map the products to its merchant.
1 request, 1 query. That is how we win!