Skip to main content

Handling Paginated Collection

In this module, you'll learn how to collect data from REST API endpoints that use pagination. Cribl Stream does not support all pagination methods, so this module will highlight the currently supported ones.

Response Body Attributes

In this section, we'll explore how to use a field in an API request's response body to collect data from the next page of results.

note

First, you'll need to visualize the data you're working with. Run the following command in your terminal:

curl -s http://rest-server/response/body | jq

In the JSON Response body, you'll see the next attribute under the pagination parent attribute. This field continually updates on each collection job. We'll configure the Collector to use this field to obtain all pages from this REST API endpoint.

{
"items": [
...
],
"pagination": {
"self": "/response/body",
"next": "/response/body?size=25&limit=5&offset=5"
}
}
note

Let's see what the next page looks like. Run the following command in your terminal window:

curl -s 'http://rest-server/response/body?size=25&limit=5&offset=5' | jq

Notice that the next field value is now /response/body?size=25&limit=5&offset=10, and a previous attribute has been added.

When this collection job runs in Cribl Stream, the next field values will be:

  • /response/body?size=25&limit=5&offset=10
  • /response/body?size=25&limit=5&offset=15
  • /response/body?size=25&limit=5&offset=20
note

Let's look at the last page. Run the following command in your terminal window:

curl -s 'http://rest-server/response/body?size=25&limit=5&offset=20' | jq

When you collect from the last page, the next field is no longer returned meaning that we have completed the collection of all items from this result set.

Now you'll configure this as a REST Collector in Cribl Stream.

important

Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in preceding modules.

  1. In the Collector ID field, enter collect_response_body.

  2. Copy the following value to the Collect URL field.

    'http://rest-server/response/body'
  3. In the Pagination drop-down, select Response Body Attribute.

  4. Set the Response Attribute to pagination.next.

    Note: The default Max Pages value of 50 is fine for this sandbox. But for large collection jobs in your own environment, you might need to increase this value.

  5. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview should now display five events. The events are not correctly broken up. Later in this course, we'll configure an Event Breaker to unroll the arrays of data.

note

To view more details about the paginated collection, click the Preview modal's Options (•••) menu at the upper right, and enable Show Internal Fields. Then expand the __collectible field to inspect each event's page number, link, etc.

Response Body Attributes with Has-more Expression

Sometimes REST API endpoints don't provide you a deterministic way of ending collection based on the URL. For example, collecting from the same URL but with a different offset might be problematic with Cribl Stream if there isn't a way to determine the last page to stop collection. In version 4.0.3, we introduced a new way to evaluate whether to continue collection. This section highlights the feature.

note

First, you'll need to visualize the data you're working with. Run the following command in your terminal:

curl -s http://rest-server/response/body/more | jq

In the JSON Response body, you'll see the next and more attributes under the pagination parent attribute. The more value will return true when there are more pages to collect and false when you should stop collecting from the API.

{
"items": [
...
],
"pagination": {
"next": 5
"more": true
}
}
note

Cool! Now let's try it on the last page. Run the following command in your terminal window to find out:

curl -s 'http://rest-server/response/body/more?offset=20' | jq '.pagination'

Notice that the next field value is incremented, but the more field shows a value of false now. If we were to try and collect from the offset provided, we wouldn't get any results. With the second more field, we can now tell Cribl Stream to evaluate if it should continue collecting more data.

Now you'll configure this as a REST Collector in Cribl Stream.

important

Navigate to the Manage > Data > Sources > Collectors > REST > New Collector configuration modal, as you did in preceding modules.

  1. In the Collector ID field, enter collect_response_has_more.

  2. Copy the following value to the Collect URL field.

    'http://rest-server/response/body/more'
  3. In the Collect parameters section, click Add parameter and fill out the following:

    • In the Name box, enter offset.
    • In the Value box, enter `${next}`.

    This will automatically add a query string parameter ?offset=<number> to the Collect URL.

  4. In the Pagination drop-down, select Response Body Attribute.

  5. Add the following values to the Response attribute field: next, more.

  6. Enter the following Last-page expression:

    more === false
  7. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview should now display five events. The events are not correctly broken up. Later in this course, we'll configure an Event Breaker to unroll the arrays of data.

note

To view more details about the paginated collection, click the Preview modal's Options (•••) menu at the upper right, and enable Show Internal Fields. Then expand the __collectible field to inspect each event's page number, link, etc.

For events 2 through 5, you'll see three entries added to the __collectible field. Note the next and more values that appear in each event.

Response Headers

Unlike the response-body pagination attributes that you worked with in the preceding section, some endpoints return information about the next page in the HTTP response headers.

note

To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:

curl -v http://rest-server/response/headers

You'll see a result that looks similar to the following. Note: This output is truncated for brevity.

< HTTP/1.1 200 OK
...
< nextLink: /response/headers?size=25&limit=5&offset=5
...
{"items":[{"item":1},{"item":2},{"item":3},{"item":4},{"item":5}]}%

Now you'll configure this as a REST Collector in Cribl Stream.

important
  1. Navigate to the Manage > Data >Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.
  2. In the Collector ID field, enter collect_response_header.
  3. Copy the following URL to the Collect URL field.
    'http://rest-server/response/headers'
  4. In the Pagination drop-down, select Response Header Attribute.
  5. Set the Response Attribute field to nextLink.
  6. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page, until the response does not contain the next link.

The Preview modal should display five events containing five items in the items array.

RFC 5988 - Web Linking

A special type of pagination – RFC 5988, known as Web Linking – can be used to collect data from endpoints. With this option, Cribl Stream follows rel="next" links until there is no data left to collect.

note

To view a sample of response data from this type of endpoint, run the following command in your terminal window:

curl -v http://rest-server/linking

You'll see a result that looks like the following. Note: This output is truncated for brevity.

< HTTP/1.1 200 OK
...
< Link: </linking>; rel="self"
< Link: </linking?size=25&limit=5&offset=5>; rel="next"
...
{"items":[{"item":1},{"item":2},{"item":3},{"item":4},{"item":5}],"pagination":{"size":5,"limit":5,"offset":0,"total":25}}%
important
  1. Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in preceding modules.
  2. In the Collector ID field, enter collect_web_linking.
  3. Copy the following URL to the Collect URL field.
    'http://rest-server/linking'
  4. In the Pagination dropdown, select RFC 5988 - Web Linking. Optionally, in the Current page relation name field, enter self.
  5. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page until either the no further next link is returned, or the next link is the same as the self link.

The Preview modal should now display five events containing five items in the items array.

Offset/Limit

Offset pagination works by calculating the "offset" of the first item to be collected from the page. Cribl Stream introduced this feature in version 3.4.

note

To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:

curl http://rest-server/limit

You'll see terminal output that looks like the following:

{
"items": [
{
"item": 1
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"offset": 0,
"total": 25
}
}

The pagination field provides the pagination data for calculating the offset. To collect the second page, we need to offset the collection by five, because our server is zero-index based.

note

Run the following command in your sandbox terminal to see the second page of data:

curl 'http://rest-server/limit?offset=5'

You'll see terminal output that looks like the following:

{
"items": [
{
"item": 6
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"offset": 5,
"total": 25
}
}

Now we'll configure this as a REST Collector in Cribl Stream.

important
  1. Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.
  2. In the Collector ID field, enter collect_limit_offset.
  3. Copy the following URL to the Collect URL field.
    'http://rest-server/limit'
  4. In the Pagination drop-down, select Offset/Limit.
  5. In the Limit field, change 50 to 5.
  6. In the Total record count filed name field, enter total.
  7. Enable the zero‑based index check box.
  8. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page until the offset reaches the total record size.

The Preview modal should now display five events containing five items in the items array.

Page/Size

Page/Size pagination works in a similar fashion to Offset/Limit pagination, but instead of references to the numbers of objects, you would use a reference to the pages, where a page represents a set of items.

note

To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:

curl http://rest-server/page

You'll see terminal output that looks like the following:

{
"items": [
{
"item": 1
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"page": 0,
"total": 25
}
}
note

Run the following command in your sandbox terminal to see the second page of data:

curl 'http://rest-server/page?page=1'

You'll see terminal output that looks like the following:

{
"items": [
{
"item": 6
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"page": 1,
"total": 25
}
}

Now we'll configure this as a REST Collector in Cribl Stream.

important
  1. Navigate to the Manage > Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.
  2. In the Collector ID field, enter collect_page_size.
  3. Copy the following URL to the Collect URL field.
    'http://rest-server/page'
  4. In the Pagination drop-down, select Page/Size.
  5. In the Page size field name field, enter limit.
  6. In the Page size field, change 50 to 5.
  7. In the Total record count field name field, enter total.
  8. Enable the zero‑based index check box.
  9. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Cribl Stream will collect the next page until the page number is out of range of the total number of pages based on the item count.

The Preview modal should now display five events containing five items in the items array.

Conclusion

In this module, you explored how to configure different types of results pagination in Cribl Stream's REST Collector.

In the next module, you'll learn how to authenticate on protected REST API endpoints using Bearer tokens.