Handling Paginated Collection
In this module, you'll learn how to collect data from REST API endpoints that use pagination. Cribl Stream does not support all pagination methods, so this module will highlight the currently supported ones.
Response Body Attributes
In this section, we'll explore how to use a field in an API request's response body to collect data from the next page of results.
First, you'll need to visualize the data you're working with. Run the following command in your terminal:
curl -s http://rest-server/response/body | jq
In the JSON Response body, you'll see the next
attribute under the pagination
parent attribute. This field continually updates on each collection job. We'll configure the Collector to use this field to obtain all pages from this REST API endpoint.
{
"items": [
...
],
"pagination": {
"self": "/response/body",
"next": "/response/body?size=25&limit=5&offset=5"
}
}
Let's see what the next page looks like. Run the following command in your terminal window:
curl -s 'http://rest-server/response/body?size=25&limit=5&offset=5' | jq
Notice that the next
field value is now /response/body?size=25&limit=5&offset=10
, and a previous
attribute has been added.
When this collection job runs in Cribl Stream, the next
field values will be:
/response/body?size=25&limit=5&offset=10
/response/body?size=25&limit=5&offset=15
/response/body?size=25&limit=5&offset=20
Let's look at the last page. Run the following command in your terminal window:
curl -s 'http://rest-server/response/body?size=25&limit=5&offset=20' | jq
When you collect from the last page, the next
field is no longer returned meaning that we have completed the collection of all items from this result set.
Now you'll configure this as a REST Collector in Cribl Stream.
Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in preceding modules.
-
In the Collector ID field, enter
collect_response_body
. -
Copy the following value to the Collect URL field.
'http://rest-server/response/body'
-
In the Pagination drop-down, select Response Body Attribute.
-
Set the Response Attribute to
pagination.next
.Note: The default Max Pages value of
50
is fine for this sandbox. But for large collection jobs in your own environment, you might need to increase this value. -
At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.
The Preview should now display five events. The events are not correctly broken up. Later in this course, we'll configure an Event Breaker to unroll the arrays of data.
To view more details about the paginated collection, click the Preview modal's Options (•••) menu at the upper right, and enable Show Internal Fields. Then expand the __collectible
field to inspect each event's page number, link, etc.
Response Body Attributes with Has-more Expression
Sometimes REST API endpoints don't provide you a deterministic way of ending collection based on the URL. For example, collecting from the same URL but with a different offset might be problematic with Cribl Stream if there isn't a way to determine the last page to stop collection. In version 4.0.3, we introduced a new way to evaluate whether to continue collection. This section highlights the feature.
First, you'll need to visualize the data you're working with. Run the following command in your terminal:
curl -s http://rest-server/response/body/more | jq
In the JSON Response body, you'll see the next
and more
attributes under the pagination
parent attribute. The more
value will return true
when there are more pages to collect and false
when you should stop collecting from the API.
{
"items": [
...
],
"pagination": {
"next": 5
"more": true
}
}
Cool! Now let's try it on the last page. Run the following command in your terminal window to find out:
curl -s 'http://rest-server/response/body/more?offset=20' | jq '.pagination'
Notice that the next
field value is incremented, but the more
field shows a value of false
now. If we were to try and collect from the offset provided, we wouldn't get any results. With the second more
field, we can now tell Cribl Stream to evaluate if it should continue collecting more data.
Now you'll configure this as a REST Collector in Cribl Stream.
Navigate to the Manage > Data > Sources > Collectors > REST > New Collector configuration modal, as you did in preceding modules.
-
In the Collector ID field, enter
collect_response_has_more
. -
Copy the following value to the Collect URL field.
'http://rest-server/response/body/more'
-
In the Collect parameters section, click Add parameter and fill out the following:
- In the Name box, enter offset.
- In the Value box, enter
`${next}`
.
This will automatically add a query string parameter
?offset=<number>
to the Collect URL. -
In the Pagination drop-down, select Response Body Attribute.
-
Add the following values to the Response attribute field:
next
,more
. -
Enter the following Last-page expression:
more === false
-
At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.
The Preview should now display five events. The events are not correctly broken up. Later in this course, we'll configure an Event Breaker to unroll the arrays of data.
To view more details about the paginated collection, click the Preview modal's Options (•••) menu at the upper right, and enable Show Internal Fields. Then expand the __collectible
field to inspect each event's page number, link, etc.
For events 2 through 5, you'll see three entries added to the __collectible
field. Note the next
and more
values that appear in each event.
Response Headers
Unlike the response-body pagination attributes that you worked with in the preceding section, some endpoints return information about the next page in the HTTP response headers.
To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:
curl -v http://rest-server/response/headers
You'll see a result that looks similar to the following. Note: This output is truncated for brevity.
< HTTP/1.1 200 OK
...
< nextLink: /response/headers?size=25&limit=5&offset=5
...
{"items":[{"item":1},{"item":2},{"item":3},{"item":4},{"item":5}]}%
Now you'll configure this as a REST Collector in Cribl Stream.
- Navigate to the Manage > Data >Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.
- In the Collector ID field, enter
collect_response_header
. - Copy the following URL to the Collect URL field.
'http://rest-server/response/headers'
- In the Pagination drop-down, select Response Header Attribute.
- Set the Response Attribute field to
nextLink
. - At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.
Cribl Stream will collect the next page, until the response does not contain the next link.
The Preview modal should display five events containing five items in the items
array.
RFC 5988 - Web Linking
A special type of pagination – RFC 5988, known as Web Linking – can be used to collect data from endpoints. With this option, Cribl Stream follows rel="next"
links until there is no data left to collect.
To view a sample of response data from this type of endpoint, run the following command in your terminal window:
curl -v http://rest-server/linking
You'll see a result that looks like the following. Note: This output is truncated for brevity.
< HTTP/1.1 200 OK
...
< Link: </linking>; rel="self"
< Link: </linking?size=25&limit=5&offset=5>; rel="next"
...
{"items":[{"item":1},{"item":2},{"item":3},{"item":4},{"item":5}],"pagination":{"size":5,"limit":5,"offset":0,"total":25}}%
- Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in preceding modules.
- In the Collector ID field, enter
collect_web_linking
. - Copy the following URL to the Collect URL field.
'http://rest-server/linking'
- In the Pagination dropdown, select RFC 5988 - Web Linking. Optionally, in the Current page relation name field, enter
self
. - At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.
Cribl Stream will collect the next page until either the no further next link is returned, or the next link is the same as the self
link.
The Preview modal should now display five events containing five items in the items
array.
Offset/Limit
Offset pagination works by calculating the "offset" of the first item to be collected from the page. Cribl Stream introduced this feature in version 3.4.
To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:
curl http://rest-server/limit
You'll see terminal output that looks like the following:
{
"items": [
{
"item": 1
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"offset": 0,
"total": 25
}
}
The pagination
field provides the pagination data for calculating the offset. To collect the second page, we need to offset the collection by five, because our server is zero-index based.
Run the following command in your sandbox terminal to see the second page of data:
curl 'http://rest-server/limit?offset=5'
You'll see terminal output that looks like the following:
{
"items": [
{
"item": 6
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"offset": 5,
"total": 25
}
}
Now we'll configure this as a REST Collector in Cribl Stream.
- Navigate to the Manage > Data > Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.
- In the Collector ID field, enter
collect_limit_offset
. - Copy the following URL to the Collect URL field.
'http://rest-server/limit'
- In the Pagination drop-down, select Offset/Limit.
- In the Limit field, change
50
to5
. - In the Total record count filed name field, enter
total
. - Enable the zero‑based index check box.
- At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.
Cribl Stream will collect the next page until the offset reaches the total record size.
The Preview modal should now display five events containing five items in the items
array.
Page/Size
Page/Size pagination works in a similar fashion to Offset/Limit pagination, but instead of references to the numbers of objects, you would use a reference to the pages, where a page represents a set of items.
To view a sample of response data from this type of endpoint, run the following command in your sandbox terminal:
curl http://rest-server/page
You'll see terminal output that looks like the following:
{
"items": [
{
"item": 1
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"page": 0,
"total": 25
}
}
Run the following command in your sandbox terminal to see the second page of data:
curl 'http://rest-server/page?page=1'
You'll see terminal output that looks like the following:
{
"items": [
{
"item": 6
},
...
],
"pagination": {
"size": 5,
"limit": 5,
"page": 1,
"total": 25
}
}
Now we'll configure this as a REST Collector in Cribl Stream.
- Navigate to the Manage > Sources > Collectors > REST > Add Collector configuration modal, as you did in the preceding section.
- In the Collector ID field, enter
collect_page_size
. - Copy the following URL to the Collect URL field.
'http://rest-server/page'
- In the Pagination drop-down, select Page/Size.
- In the Page size field name field, enter
limit
. - In the Page size field, change
50
to5
. - In the Total record count field name field, enter
total
. - Enable the zero‑based index check box.
- At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.
Cribl Stream will collect the next page until the page number is out of range of the total number of pages based on the item count.
The Preview modal should now display five events containing five items in the items
array.
Conclusion
In this module, you explored how to configure different types of results pagination in Cribl Stream's REST Collector.
In the next module, you'll learn how to authenticate on protected REST API endpoints using Bearer tokens.