Pagination in gRPC

Looking into optimizing one of my APIs, I recently stumbled upon the following resource: Common design patterns. This contains a lot of insights on how to design proto files in order to make gRPC APIs more idiomatic or more efficient. Within this document, I found the pagination part interesting and decided to write an article on how to implement it.

A disclaimer

In this article I assume that you already know how to create a simple server in gRPC. I will not show all the boilerplate, you can check it in here.

Finally, this article contains data from Packt. They do not sponsor this article in any way. I'm not getting any money promoting these books (except mine), I simply needed some interesting data for this article.

An explanation

Before starting the implementation, let's first understand what we are implementing.

What

Pagination is a mechanism which allows the consumer of the API to get a subset of the available resources. This is done in order to limit the payload size returned by the API and thus make the API response faster.

This is generally implemented with the combination of page_size and page_token fields. The former tells how big is the subset we want, and the latter act as an index from which we are going to get the next subset.

Let's see an example of such a pagination. We have the following data:

{
  "books": [
    {
      "name": "gRPC Go for Professionals",
      "description": "In recent years, the popularity of microservice architecture has surged, bringing forth a new set of requirements.",
      "authors": [
        "Clément Jean"
      ],
      "published": "2023-07-01T00:00:00Z",
      "pages": 260,
      "isbn": "9781837638840"
    },
    {
      "name": "Full-Stack Web Development with Go",
      "description": "Go is a modern programming language with capabilities to enable high-performance app development.",
      "authors": [
        "Nanik Tolaram",
        "Nick Glynn"
      ],
      "published": "2023-02-01T00:00:00Z",
      "pages": 302,
      "isbn": "9781803234199"
    },
    {
      "name": "Domain-Driven Design with Golang",
      "description": "Domain-driven design (DDD) is one of the most sought-after skills in the industry.",
      "authors": [
        "Matthew Boyle"
      ],
      "published": "2022-12-01T00:00:00Z",
      "pages": 204,
      "isbn": "9781804613450"
    },
    {
      "name": "Building Modern CLI Applications in Go",
      "description": "Although graphical user interfaces (GUIs) are intuitive and user-friendly, nothing beats a command-line interface",
      "authors": [
        "Marian Montagnino"
      ],
      "published": "2023-03-01T00:00:00Z",
      "pages": 406,
      "isbn": "9781804611654"
    },
    {
      "name": "Functional Programming in Go",
      "description": "While Go is a multi-paradigm language that gives you the option to choose whichever paradigm works best",
      "authors": [
        "Dylan Meeus"
      ],
      "published": "2023-03-01T00:00:00Z",
      "pages": 248,
      "isbn": "9781801811163"
    },
    {
      "name": "Event-Driven Architecture in Golang",
      "description": "Event-driven architecture in Golang is an approach used to develop applications that shares state changes asynchronously, internally, and externally using messages.",
      "authors": [
        "Michael Stack"
      ],
      "published": "2022-11-01T00:00:00Z",
      "pages": 384,
      "isbn": "9781803238012"
    },
    {
      "name": "Test-Driven Development in Go",
      "description": "Experienced developers understand the importance of designing a comprehensive testing strategy to ensure efficient shipping and maintaining services in production.",
      "authors": [
        "Adelina Simion"
      ],
      "published": "2023-04-01T00:00:00Z",
      "pages": 342,
      "isbn": "9781803247878"
    },
    {
      "name": "Mastering Go",
      "description": "Mastering Go is the essential guide to putting Go to work on real production systems.",
      "authors": [
        "Mihalis Tsoukalos"
      ],
      "published": "2021-08-01T00:00:00Z",
      "pages": 682,
      "isbn": "9781801079310"
    },
    {
      "name": "Network Automation with Go",
      "description": "Go’s built-in first-class concurrency mechanisms make it an ideal choice for long-lived low-bandwidth I/O operations, which are typical requirements of network automation and network operations applications.",
      "authors": [
        "Nicolas Leiva"
      ],
      "published": "2023-01-01T00:00:00Z",
      "pages": 442,
      "isbn": "9781800560925"
    },
    {
      "name": "Microservices with Go",
      "description": "This book covers the key benefits and common issues of microservices, helping you understand the problems microservice architecture helps to solve, the issues it usually introduces, and the ways to tackle them.",
      "authors": [
        "Alexander Shuiskov"
      ],
      "published": "2022-11-01T00:00:00Z",
      "pages": 328,
      "isbn": "9781804617007"
    },
    {
      "name": "Effective Concurrency in Go",
      "description": "The Go language has been gaining momentum due to its treatment of concurrency as a core language feature, making concurrent programming more accessible than ever.",
      "authors": [
        "Burak Serdar"
      ],
      "published": "2023-04-01T00:00:00Z",
      "pages": 212,
      "isbn": "9781804619070"
    }
  ]
}

As expected, if we start by requesting subset of size 2, we will get the first two books (gRPC Go for Professionals and Full-Stack Web Development with Go). On top of that result, a page token will be returned to us. If we now use this token and request a subset of size 2 we will get the 2 following books (Domain-Driven Design with Golang and Building Modern CLI Applications in Go).

This is pretty much it. It is simple to grasp and also simple to implement.

The setup

In this article:

  • I will be using Postgres to store our books' data. You can find the initialization script here
  • I will run Postgres and the gRPC with Docker Compose. You can find the YAML file here

Protobuf

If you check the List Pagination section, you will see that we have the following protobuf schema:

rpc ListBooks(ListBooksRequest) returns (ListBooksResponse);

message ListBooksRequest {
  string parent = 1;
  int32 page_size = 2;
  string page_token = 3;
}

message ListBooksResponse {
  repeated Book books = 1;
  string next_page_token = 2;
}

This code is mostly correct but we are going to remove the parent field. If you are interested in knowing what this is used for, check the List Sub-Collections section.

So we now have:

message ListBooksRequest {
  int32 page_size = 1;
  string page_token = 2;
}

message ListBooksResponse {
  repeated Book books = 1;
  string next_page_token = 2;
}

service BookStoreService {
  rpc ListBooks(ListBooksRequest) returns (ListBooksResponse);
}

The last thing that we need to do is defining the Book message:

import "google/protobuf/timestamp.proto";

message Book {
  string name = 1;
  string description = 2;
  repeated string authors = 3;
  google.protobuf.Timestamp published = 4;
  uint32 pages = 5;
  string isbn = 6;
}

There is nothing fancy here. We simply laid out all the information needed to represent our books.

ListBooks

Let's get started with an empty implementation for ListBooks:

func (s *server) ListBooks(ctx context.Context, req *pb.ListBooksRequest) (*pb.ListBooksResponse, error) {
}

The first step in every endpoint implementation is to validate arguments. In our case we will validate the page_size. We mostly need to check that the page_size is not too big because it defeats the purpose of pagination, and if no page_size is provided we are going to set it to a default value.

const (
  defaultPageSize = 10
  maxPageSize     = 30
)

func validatePageSize(req *pb.ListBooksRequest) error {
  if req.PageSize > maxPageSize {
    msg := fmt.Sprintf(
      "expected page size between 0 and %d, got %d",
      maxPageSize, req.PageSize,
    )
    return errors.New(msg)
  } else if req.PageSize == 0 { // no page_size provided
    req.PageSize = defaultPageSize
  }

  return nil
}

This means that we can now do the following in ListBooks:

if err := validatePageSize(req); err != nil {
  return nil, status.New(codes.InvalidArgument, err.Error()).Err()
}

Next, we need to validate page_token. In this implementation, I decided to use ULIDs. This is because they are short ids and they are lexicographically sortable. The sortability part is interesting because we will sort the books by their IDs which are ULIDs.

Fortunately for us oklog provides an ULID implementation for us to verify if a ULID is valid or not. In ListBooks, we can simply do:

if _, err := ulid.Parse(req.PageToken); len(req.PageToken) != 0 && err != nil {
  msg := fmt.Sprintf("expected valid ULID, got error %v", err)
  return nil, status.New(codes.InvalidArgument, msg).Err()
}

Notice that the page_token is optional (len(req.PageToken) != 0). When we do not provide one we will start from the beginning of the dataset.

Then, we need to generate the SQL query in order to get the subsets. We need to create the following SQL:

SELECT *
FROM book
WHERE id > page_token
LIMIT page_size
ORDER id ASC

Obviously, because the page_token is optional, the where clause is optional too.

Using GORM, we can easily create the request by writing the following:

query := s.db.Table("book").Limit(int(req.PageSize)).Order("id ASC")

if len(req.PageToken) != 0 {
  query = query.Where("id > ?", req.PageToken)
}

Now that we have the query, we can simply execute it and map the result into our Protobuf Book model:

var queryRes = []Book{}
query.Scan(&queryRes) // execute query

if len(queryRes) == 0 {
  // short circuit if not results
  return &pb.ListBooksResponse{}, nil
}

books := utils.Map(queryRes, mapBookToBookPb)

Finally, we can get the ID (ULID) of the last item in subset (queryRes) and this will represent the page_token from where a subsequent request need to start getting new result.

lastItemIdx := len(queryRes) - 1
nextPageToken := queryRes[lastItemIdx].ID

if len(queryRes) < int(req.PageSize) {
  // no more pages
  nextPageToken = ""
}

return &pb.ListBooksResponse{
  Books:         books,
  NextPageToken: nextPageToken,
}, nil

And we now have pagination! Let's go ahead and test it.

Testing

The first thing we can test is the case where the consumer doesn't provide a page_token and page_size. This should return 10 results (see defaultPageSize) from the beginning of the data.

$ grpcurl -plaintext \
          -proto proto/store.proto \
          -d '{}' \
          0.0.0.0:50051 BookStoreService.ListBooks

{
  "books": [
    {
      "name": "Full-Stack Web Development with Go",
      ...
    },
    {
      "name": "Domain-Driven Design with Golang",
      ...
    },
    {
      "name": "Building Modern CLI Applications in Go",
      ...
    },
    {
      "name": "Functional Programming in Go",
      ...
    },
    {
      "name": "Event-Driven Architecture in Golang",
      ...
    },
    {
      "name": "Test-Driven Development in Go",
      ...
    },
    {
      "name": "Mastering Go",
      ...
    },
    {
      "name": "Network Automation with Go",
      ...
    },
    {
      "name": "Microservices with Go",
      ...
    },
    {
      "name": "Effective Concurrency in Go",
      ...
    }
  ],
  "nextPageToken": "01H8EH4VYYCS6M4BFVZ90RP7FS"
}

First, you can notice that we had 11 datum and that because we asked for 10 we didn't get "gRPC Go for Professionals". And secondly, we can see that we got the nextPageToken field.

Let's now use the nextPageToken as page_token:

$ grpcurl -plaintext \
          -proto proto/store.proto \
          -d '{"page_token": "01H8EH4VYYCS6M4BFVZ90RP7FS"}' \
          0.0.0.0:50051 BookStoreService.ListBooks

{
  "books": [
    {
      "name": "gRPC Go for Professionals",
      ...
    }
  ]
}

And here we get our 11th datum!

Finally, we can try mixing the page_token and page_size fields. Let's say that we are going to have a page_size of 2. We will do the first request without page_token to get the 2 first elements:

$ grpcurl -plaintext \
          -proto proto/store.proto \
          -d '{"page_size": 2}' \
          0.0.0.0:50051 BookStoreService.ListBooks

{
  "books": [
    {
      "name": "Full-Stack Web Development with Go",
      ...
    },
    {
      "name": "Domain-Driven Design with Golang",
      ...
    }
  ],
  "nextPageToken": "01H8EH2RM7HVFJG4HYA4XTV0R5"
}

and then we can use the nextPageToken to get the 3rd and 4th elements:

$ grpcurl -plaintext \
          -proto proto/store.proto \
          -d '{"page_size": 2, "page_token": "01H8EH2RM7HVFJG4HYA4XTV0R5"}' \
          0.0.0.0:50051 BookStoreService.ListBooks

{
  "books": [
    {
      "name": "Building Modern CLI Applications in Go",
      ...
    },
    {
      "name": "Functional Programming in Go",
      ...
    }
  ],
  "nextPageToken": "01H8EH3CKPT5BX263G0NGGKQCQ"
}

Here we go! Everything workks as expected!

Conclusion

We saw that we can implement pagination quite easily in gRPC with the combination of page_token and page_size fields in the request. We also saw that the API endpoint will return a next_page_token that we can later use as an index for the next page we want to get.

If you like this kind of content let me know in the comment section and feel free to ask for help on similar projects, recommend the next post subject or simply send me your feedback.

Written on August 24, 2023