Skip to main content

Handling Paginated Collection

By this point in the course, you should have an appreciation for how amazingly versatile REST APIs can be as a source of data. You may also have started wondering how you might deal with a veritable firehose of data that could potentially return millions of rows of data with a single request.

Believe me, here at Cribl, we think about that a lot too. We didn't get to the point where we can handle petabytes of incoming data just by sitting back and thinking happy thoughts.

Managing the flow is more than just asking for a certain number of records. It's having the ability to ask for a block, and then asking for the NEXT block, and the one after that, and so on. This technique is called, appropriately enough, "paging" and it's an essential technique for ensuring your data requiest doesn't overwhelm the receiving system. Not only that, but pagination helps mitigate against DOS attacks by forcibly paginating a request into manageable (for both the sender and the receiver) payloads.

In this module, you'll learn how to collect data from REST API endpoints that use pagination. Cribl Stream does not support all pagination methods, so this module will highlight the currently supported ones.

A quick review of REST API pagination methods

Pagination techniques fall into 4 broad categories:

  • Offset pagination: This is the easiest to understand. It uses two parameters: the limit (the number of records to return) and the offset (the records to skip before returning data). So your first page might be limit=10&offset=0, the second page would be limit=10&offset=10, the third would be limit=10&offset=20, and so on.
  • Page-based pagination: Some APIs have the concept of "paging" built into the interface, and so you can specificy page=3&size=10, which would bring up the 3rd set of 10 records (i.e. records 21-30).
  • Keyset pagination: For data sets that are especially large or update frequently, keysets work well. Instead of specifying an offset or page, you provide a specific record (using a reference to some unique element like ID), the number of records you want, and the "direction" (before or after) that specific record for the returned results. example: since_id=12345&limit=10
  • Time-based pagination: This is a variation of Keyset which uses a timestamp instead of an ID.
  • Cursor-based pagination: In this techque you provide the cursor (which the API internally understands as a record reference) and the limit (number of records). Along with the records returned, the API will provide the next cursor reference so you can pick up where you left off. So if you start by requesting cursor=abc123&limit=10, the returned data would include "nextCursor": "def456", and your next request would be cursor=def456&limit=10.

So, which options does Cribl support?

  • None — no pagination. Ok, this isn't really an option, but it's in the pagination dropdown so I'm mentioning it here.
  • Offset pagination: as described above.
  • Page-based pagination: as described above.

But Cribl's pagination also provides a few options for WHERE the pagination information is located:

  • Response Body Attribute — next-page information is read from an attribute in the response body.
  • Response Header Attribute — next-page information is read from an attribute in the response header.
  • RFC 5988 Web Linking — follows standard HTTP Link header pagination.

It's important to understand that HOW the pagination is done (offset, page-based, keyset, etc) and WHERE the paging information is found (response body, response header, web linking) are complimentary. A given API might use page based, with the information in the URL (web linking). Another might use keyset, with the information in the response body.

With all of that out of the way, let's get down to some hands-on experience.

Response Body Attributes

When using Response Body Attribute pagination with nested attributes for determining the next page, the extracted attributes from the response will use a dot-separate structure to indicate the parameter. (example: response.returnedRecords, response.startOffset, etc).

For this example, we're going to shift gears to something more fun. If you're a Rick & Morty fan, this example is for you! Yes, there is an actual Rick & Morty API (https://rickandmortyapi.com/). Let's start at the command line by seeing what response-body paging information looks like. Open up your terminal/command prompt and type:

curl https://rickandmortyapi.com/api/character | jq | more

Yeah, there's a lot of info there, but I want you to focus at the very top:

{
"info": {
"count": 826,
"pages": 42,
"next": "https://rickandmortyapi.com/api/character?page=2",
"prev": null
}

Right there we get solid information on how many records (count), how many pages it's split into (pages), the URL of the next page (next) and the one for the previous page (which there isn't, because we're on page 1).

Let's look at the same info for page 4.

curl https://rickandmortyapi.com/api/character?page=4 | jq | more

That top section should look like this:

{
"info": {
"count": 826,
"pages": 42,
"next": "https://rickandmortyapi.com/api/character?page=5",
"prev": "https://rickandmortyapi.com/api/character?page=3"
}

Now let's set this up as a REST Collector in Cribl Stream.

important

Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in preceding modules.

  1. In the Collector ID field, enter collect_response_body.

  2. Copy the following value to the Collect URL field.

    'https://rickandmortyapi.com/api/character'
note

Those are single quotes, not backtiks, and you definitely need them.

  1. In the Pagination drop-down, select Response Body Attribute.

  2. Set the Response Attribute to info.next.

  3. in case it's not obvious, we got that by taking the initial array name ("info": {) and appending the array element name ("next").

  4. Set the Page limit to 5.

    Note: The default Max Pages value of 50 is fine for many use cases. Just be aware that depending on the circumstances you might need to increase this value.

  5. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview should now display five events.

note

To view more details about the paginated collection, click the Preview modal's Options (•••) menu at the upper right, and enable Show Internal Fields. Then expand the __collectible field to inspect each event's page number, link, etc.

Response Body Attributes with Has-more Expression

Sometimes REST API endpoints don't provide you a deterministic way of ending collection based on the URL. For example, collecting from the same URL but with a different offset might be problematic with Cribl Stream if there isn't a way to determine the last page to stop collection. In version 4.0.3, we introduced a new way to evaluate whether to continue collection. This section highlights the feature.

note

First, you'll need to visualize the data you're working with. Run the following command in your terminal:

curl -s 'https://dummyjson.com/products' | jq

In the JSON Response body, you'll see the total, skip, and limit attributes.

  ],
"total": 194,
"skip": 0,
"limit": 30
}

We're going to use these to construct a formula that let's Cribl know when it's reached the end of the data set.

important

Navigate to the Manage > Data > Sources > Collectors > REST > New Collector configuration modal, as you did in preceding modules.

  1. In the Collector ID field, enter collect_response_has_more.

  2. Copy the following value to the Collect URL field.

    'https://dummyjson.com/products'
  3. In the Collect parameters section, click Add parameter and fill out the following:

    • In the Name box, enter total.
    • In the Value box, enter `$\{total\}`.
    • click Add parameter again.
    • In the Name box, enter skip.
    • In the Value box, enter `$\{skip\}`.
    • Once agian, click Add parameter.
    • In the Name box, enter limit.
    • In the Value box, enter `$\{limit\}`.
  4. In the Pagination drop-down, select Response Body Attribute.

  5. Add the following values to the Response attribute field: skip, limit.

  6. Set the Page limit field to 5

  7. Enter the following Last-page expression:

    skip + limit >= total
    1. What this formula is saying is that the last page of data is reached when the skip (record to start from) plus the limit (number of records to return) is greater than or equal to the total records in the data set.
  8. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview should now display five events.

note

To view more details about the paginated collection, click the Preview modal's Options (•••) menu at the upper right, and enable Show Internal Fields. Then expand the __collectible field to inspect each event's page number, link, etc.

RFC 5988 - Web Linking

A special type of pagination – RFC 5988, known as Web Linking – can be used to collect data from endpoints. With this option, Cribl Stream follows rel="next" links until there is no data left to collect.

note

To view a sample of response data from this type of endpoint, run the following command in your terminal window:

curl -sI "https://api.github.com/repos/torvalds/linux/commits?per_page=5"

You'll see a result that looks like the following. Note: This output is truncated for brevity.

HTTP/2 200 
date: Mon, 01 Jun 2026 20:39:49 GMT
content-type: application/json; charset=utf-8
cache-control: public, max-age=60, s-maxage=60
vary: Accept,Accept-Encoding, Accept, X-Requested-With
etag: W/"b337fe830ff3dfc72601a9f9d12e29f784a1f26470daa3fe2d6890e93354c501"
last-modified: Sun, 31 May 2026 22:14:24 GMT
x-github-media-type: github.v3; format=json
link: <https://api.github.com/repositories/2325298/commits?per_page=5&page=2>; rel="next", <https://api.github.com/repositories/2325298/commits?per_page=5&page=289345>; rel="last"
x-github-api-version-selected: 2022-11-28
important

Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in preceding modules.

  1. In the Collector ID field, enter collect_web_linking.
  2. Copy the following URL to the Collect URL field.
    'https://api.github.com/repos/torvalds/linux/commits'
  3. In the Pagination dropdown, select RFC 5988 - Web Linking.
  4. Sed the Next page relation name to next.
    1. which we got from the rel="next" option within the link: variable.
  5. Set the Page Limit option to 5
  6. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page until either the no further next link is returned.

So why did we get 15 records back? This is an example of why you need "event breakers", which we'll get to in a future lesson. For now, just trust that you didn't do anything wrong.

The Preview modal should now display five events containing five items in the items array.

Offset/Limit Pagination Options

As mentioned earlier, offset pagination works by calculating the "offset" of the first item to be collected from the page. Cribl Stream introduced this feature in version 3.4. And we already looked at an API that had offset and limit information available. Run the following command in your sandbox terminal:

curl -s 'https://dummyjson.com/products' | jq

You'll see terminal output that looks like the following:

*** a lot of other stuff before the end ***
],
"total": 194,
"skip": 0,
"limit": 30
}

Presuming we kept the page size (limit) at the default of 30, then to collect the second page, we would need to set the skip value to 30. Why not 31? Because this API (and most, to be honest) are zero-based.

note

Run the following command in your sandbox terminal to see the second page of data:

curl -s 'https://dummyjson.com/products?skip=30' | jq

You'll have to scroll back to the top, but if you do you'll see our records begin at ID 31:

{
"products": [
{
"id": 31,
"title": "Lemon",
"description": "Zesty and tangy lemons, versatile for cooking, baking, or making refreshing beverages.",
"category": "groceries",

Now we'll configure this as a REST Collector in Cribl Stream.

important

Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.

  1. In the Collector ID field, enter collect_limit_offset.
  2. Copy the following URL to the Collect URL field.
    'http://rest-server/limit'
  3. In the Pagination drop-down, select Offset/Limit.
  4. In the offset field name, put skip.
  5. You can leave the Starting offset field blank, but you should note it's there in case you always want to skip a certain number of records for each collection.
  6. In the Limit field name field, enter limit.
  7. Change the Record limit field from the default to 5. Just so we can keep this example manageable.
  8. In the Total record count field name field, enter total. Change the Page limit field from the default to 5. Again, just to keep things easy to view.
  9. Enable the zero‑based index toggle.
  10. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page until the offset reaches the total record size.

The Preview modal should now display five records.

Page/Size

Page/Size pagination works in a similar fashion to Offset/Limit pagination, but instead of references to the numbers of objects, you would use a reference to the pages, where a page represents a set of items.

note

To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:

curl 'https://hn.algolia.com/api/v1/search?&page=3' | jq

You'll see terminal output from HackerNews Algolia that looks like the following:

*** Lots of stuff ***
"hitsPerPage": 20,
"nbHits": 44882494,
"nbPages": 50,
"page": 3,
"params": "page=3&advancedSyntax=true&analyticsTags=backend",
"processingTimeMS": 1,
"processingTimingsMS": {
"_request": {
"roundTrip": 19
},
"total": 1
},
"query": "",
"serverTimeMS": 1
}
note

Notice that "page": 3, item? Let's change our URL slightly and see if it tracks:

curl 'https://hn.algolia.com/api/v1/search?&page=5' | jq

You'll see terminal output that looks like the following:

  "hitsPerPage": 20,
"nbHits": 44882514,
"nbPages": 50,
"page": 5,
"params": "page=5&advancedSyntax=true&analyticsTags=backend",
"processingTimeMS": 1,
"processingTimingsMS": {
"_request": {
"roundTrip": 24
},
"total": 1
},
"query": "",
"serverTimeMS": 2
}

That's the point. The Page/Size paging option lets us directly manipulate that "page" element in Cribl Stream's REST Collector.

important

Navigate to the Manage > Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.

  1. In the Collector ID field, enter collect_page_size.
  2. Copy the following URL to the Collect URL field.
    'https://hn.algolia.com/api/v1/search'
  3. In the Pagination drop-down, select Page/Size.
  4. In the Page number field name enter page
  5. (which, to be honest, it already is. But you should know that you can change it if your use case demands it.)
  6. Leave the Starting page number field empty, but again, you COULD change this if you needed to.
  7. In the Page size field name field, enter hitsPerPage.
  8. Change the Page size field from 50 to 5.
  9. For the Total page count field name enter nbPages.
  10. In the Total record count field name field, enter hbHits.
  11. Change the Page limit field from the default to 5.
  12. Enable the zero‑based index check box.
  13. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page until the page number is out of range of the total number of pages based on the item count.

The Preview modal should now display five events containing five records.

Conclusion

Now would be a great time to treat yourself if you have ice cream, cookies, pie, beer, bourbon, or some other reward nearby. To be honest, sometimes just a nice hot cup of coffee will do the trick. Because this module was kind of a beast.

BUT YOU GOT THROUGH IT! And for that, you are to be lauded and applauded. You explored all the different types of API pagination and how to set them up in Cribl Stream's REST Collector.

Once you've taken a well-deserved break, don't forget to come back. Because in the next module, you'll learn how to authenticate on protected REST API endpoints using Bearer tokens.