Efficient Pagination with GraphQL Dataloader ⏳

Published on
·
Time to read
8 min read
Post category
graphql

After a while of use, we started to notice that Dataloader doesn’t have a built-in API for pagination. Then, how do we solve it?

Since its release in 2015, GraphQL, which was developed by Facebook, is predicted to be the most advanced technology that will replace the role of REST API to communicate between client and server. According to the official documentation, GraphQL is a query language for the API and runtime for fulfilling user requests. GraphQL provides a complete and understandable description and documentation of data in the API, giving clients the freedom to only request the data they need, making it easier to develop the API over time, making it a powerful development tool.

Dataloader itself, according to its official documentation, is called “a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching”. Dataloader is also a library developed by Facebook. Although it can be used in other conditions, Dataloader is generally used in conjunction with GraphQL to handle multiple requests to the database (N+1 problem).

After a while of use, we started to notice that Dataloader doesn’t have a built-in API for pagination. This is how we solve this.

Hack Time ✨

ERD

We will build an e-commerce platform where each merchants will have one locations (one-to-one) and will have many products (one-to-many). The ERD scheme can be seen as follows.

ERD schema

GraphQL Schema

Here is a pre-designed GraphQL schema (without pagination).

#typeDefs.graphql

type Query{
  allMerchants: [Merchant!]!
}

type Merchant {
  id: ID!
  name: String!
  products: [Product!]!
  location: Location!
}

type Product{
  id: ID!
  name: String!
}

type Location {
  id: ID!
  latitude: Float!
  longitude: Float!
}

GraphQL Instance and Dataloader

To initiate a Dataloader instance on every request and also for convenience, I place it in the GraphQL context.

// graphql.ts

import 'graphql-import-node';
import {makeExecutableSchema} from 'graphql-tools';
import {ApolloServer} from 'apollo-server-express';
import Knex from 'knex';
import Dataloader from 'dataloader';
import config from 'config';
import resolvers from './resolvers';
import * as typeDefs from './typeDefs.graphql';

const loaders = () => ({
  getLocationsByMerchantId: new Dataloader(async merchantIds => {
    const locations = await Knex(config.get('database'))('locations')
      .whereIn('merchantId', merchantIds);
    return merchantIds.map(merchantId => locations.find(location => location.merchantId === merchantId));
  }),
  getProductsByMerchantId: new Dataloader(async merchantIds => {
    const products = await Knex(config.get('database'))('products')
      .whereIn('merchantId', merchantIds);
    return merchantIds.map(merchantId => products.filter(product => product.merchantId === merchantId));
  })
});

const graphQLServer = new ApolloServer({
  schema: makeExecutableSchema({
    typeDefs: [typeDefs],
    resolvers
  }),
  context: ({req}) => ({
    currentUser: req.currentUser, // handled by middleware
    loaders: loaders()
  })
});

export default graphQLServer;

GraphQL Resolver

And also, the resolver itself (without pagination).

// resolvers.ts

import Knex from 'knex';
import {IGraphQLSchemaContext, Merchant} from '../../types';

export default {
  Query: {
    allMerchants() {
      return Knex(config.get('database'))('merchants');
    },
  },
  Merchant: {
    products({id: merchantId}: Merchant, _: Record<string, any>, {loaders}: IGraphQLSchemaContext) {
      return loaders.getProductsByMerchantId.load(merchantId);
    },
    location({id: merchantId}: Merchant, _: Record<string, any>, {loaders}: IGraphQLSchemaContext) {
      return loaders.getLocationsByMerchantId.load(merchantId);
    }
  }
};

The code snippet above is a typical example of using a Dataloader. We have a schema relation where the locations and products fields return all data related to merchants.

The problem arises when a merchant has a lot of products. Let’s say merchant A has 1000 products. Returning 1000 products at once is not a good idea. Besides having problems with memory consumption, the bandwidth to be taken will also be large.

Pagination

A common technique for handling large data is to use pagination. Generally, to use pagination, the client must send some supporting input variables such as limit, offset, orderBy, sortBy, and search.


}

enum SortByEnum {
    ASC,
    DESC
}

input PaginationInput {
  limit: Int = 10
  offset: Int = 0
  orderBy: String = "name"
  sortBy: SortByEnum = "ASC"
  search: String
}

After defining the pagination input, then, we have to pass the pagination argument to the Dataloader. We can do this together with the argument id of the load method. However, in my opinion, this is “messy” and breaks the principle of using arguments in the Dataloader. So, instead of calling the Dataloader instance, I call a function that has a pagination argument by returning the Dataloader instance.

The next challenge is how we do one query to the database and get all the data we need. This is the time where UNION shines. UNION is great for the use of intersecting data by returning only unique data. It is effectively reduce latency and resource overhead. An example of using union on Knex is as follows.

// graphql.ts

// ...other codes

enum PaginationSortByEnum {
  ASC = 'asc',
  DESC = 'desc'
}

interface IPagination {
  limit: number
  offset: number
  orderBy: string
  sortBy: PaginationSortByEnum
  search: string
}

const loaders = () => {
  let productsPagination: IPagination;
  const productsByMerchantIdLoader = new Dataloader(async merchantIds => {
    const connection = Knex(config.get('database'));
    const table = connection('products');
    const {limit, offset, orderBy, sortBy, search} = productsPagination;
    if(search)
        table.where('name', 'LIKE', `%${search}%`)
    const products = await connection.union(
        merchantIds.map(id =>
            table
                .where('merchantId', id)
                .orderBy(orderBy, sortBy)
                .limit(limit)
                .offset(offset)
        ),
        true //option for wrap queries
    );
    return merchantIds.map(merchantId => products.filter(product => product.merchantId === merchantId));
  });

  return {
    getLocationsByMerchantId: new Dataloader(async merchantIds => {
      const locations = await Knex(config.get('database'))('locations')
        .whereIn('merchantId', merchantIds);
      return merchantIds.map(merchantId => locations.find(location => location.merchantId === merchantId));
    }),
    getProductsByMerchantId(pagination: IPagination){
      productsPagination = pagination;
      return productsByMerchantIdLoader;
    }
  };
};

// ...other codes

Then update the resolvers as well.

// resolvers.ts

// ...other codes

export default {
  // ...other codes
  Merchant: {
    products({id: merchantId}: Merchant, {pagination: input}: {pagination: IPagination}, {loaders}: IGraphQLSchemaContext) {
      const paginationCollector = {
        limit: input?.limit || 10,
        offset: input?.offset || 0,
        sortBy: input?.sortBy || PaginationSortByEnum.ASC,
        orderBy: input?.orderBy || "createdAt",
        search: input?.search || ''
      }
      return loaders.getProductsByMerchantId(paginationCollector).load(merchantId);
    },
    // ...other codes
  }
};

To understand the following snippet is a simulation of a query and the result. For example we are querying 2 merchant_id: merchantIds = ["0713c0b4-a068-4909-b851-a4fcdad4d111", "3ec5d4f6-1459-4fcc-990a-3791bad34843"].

>> (SELECT * FROM products WHERE `merchantId` = "0713c0b4-a068-4909-b851-a4fcdad4d111" ORDER BY `name` ASC LIMIT 0,10) UNION (SELECT * FROM products WHERE `merchantId` = "3ec5d4f6-1459-4fcc-990a-3791bad34843" ORDER BY `name` ASC LIMIT 0,10)
<< [{"id":"98bfa6a5-5558-4d21-9c6c-db5d6267acc4","name":"Kecap Bango 400ml","merchantId":"0713c0b4-a068-4909-b851-a4fcdad4d111"}, ..., {"id":"9d44ca99-6f87-41c0-bf70-bc50db3d2707","name":"Apple iPhone 11 Pro","merchantId":"3ec5d4f6-1459-4fcc-990a-3791bad34843"}, ...]

All data we need is returned by the database, the only task is map the products to its merchant.

1 request, 1 query. That is how we win!