Skip to main content

Discovering Data

In this module, we'll work with the REST Collector's Discover section to dynamically collect events.

Cribl Stream runs collection jobs in five phases:

  1. Authentication (optional)
  2. Discovery (optional)
  3. Collection
  4. Event Breaking (optional)
  5. Filtering (optional)

In the previous module, you performed only step 3 (Collection). However, there's a technicality here: All collection jobs have an implicit discovered object – if one is not defined – to seed the collection phase. Without this, Cribl Stream would not run the collection job.

Now, we'll add Discovery to REST Collectors with different configurations.

Item List

The Item List discovery mode is the simplest way to configure a Collector to run multiple collection tasks in a single job.

important

If necessary, navigate back to the REST Collector Source page. From the top nav of your Cribl Stream Sandbox, with Manage active, select Data > Sources, then select Collectors > REST from the Data Sources page's tiles. Click + Add New to open the REST > Add Collector modal, which provides the following options and fields.

  1. In the Collector ID field, enter discovery_list.
  2. Expand the Discover accordion header, then from the Discover Type drop-down, select Item List.
  3. In the Discover items field, enter 1,2,3 and press your space bar. You will see the comma list convert into individual tags.
note

Note: The Discover items entries generate 3 individual collection tasks. You can now use the id variable to reference the item's value anywhere in the Collect URL, parameters, or headers inputs.

  1. Configure the Collect URL to reference the id value in the URL path.

    `https://dummyjson.com/todos/${id}`
note

Note: The backticks (`) are not the same as single quotes (') and allow you to reference the variable id, by wrapping it in curly braces preceded by a dollar sign – i.e., ${id}

  1. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview modal should display three events.

note

Note: As with the previous example, those 3 events are split up among 6 records and I promise you we're going to deal with that. But if you look at the "id" field, you'll see the data returned is for ID numbers 1, 2, and 3.

JSON Response

important

Close the Preview modal and open a new REST Collector configuration modal. (i.e. Create ANOTHER new collector.)

  1. In the Collector ID field, enter discover_json_array.

  2. Expand the Discover accordion header, then from the Discover Type drop-down, select JSON Response.

  3. Copy and paste the following JSON into the Discover result box:

    [{"id":1},{"id":2},{"id":3}]
  4. Leave the Discover data field empty. We'll explore when to configure this in the next section.

    note

    Note: The Discover items entries generate 3 individual collection tasks. You can use the id variable to reference the item's value anywhere in the Collect URL, parameters, or headers inputs.

  5. Configure the Collect URL to reference the id value in the URL path, just the same as we did in the last example.

    `https://dummyjson.com/todos/${id}`
note

Note: The backticks above allow you to reference the variable with ${id}.

  1. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview modal should display three events. They're still split between 6 records, but there you are.

JSON Response using Attribute

important

Close the Preview modal and open a new REST Collector configuration modal. (yep, add yet ANOTHER new collector)

  1. In the Collector ID field, enter discover_json_array_attribute.

  2. Expand the Discover accordion header, then from the Discover Type drop-down, select JSON Response.

  3. Copy and paste the following JSON into the Discover result box:

    {"items":[{"id":1},{"id":2},{"id":3}]}
  4. In the Discover data field, enter todos.

note

Note: Within the response, this is the name of the field that contains discovery results.

  1. Now, in the Collect section, configure the Collect URL to reference the id value in the URL path. Here again, the backticks are essential:

    `https://dummyjson.com/todos/${id}`
  2. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

The Preview modal should display three events, once again spread across 6 records.

What's The Difference?

You might reasonably be asking how these 3 techniques are different, given that the data we've collected is identical. The choice depends largely on your data. To make the distinction clearer, we'll have to switch APIs for a moment.

Item List

To clearly explain this, lets' switch to using the HaveIBeenPwned API. It gives breach information, but you need to explicitly name the breach as part of the URL path.

`curl https://haveibeenpwned.com/api/v3/breach/Adobe`
`curl https://haveibeenpwned.com/api/v3/breach/Gawker`
`curl https://haveibeenpwned.com/api/v3/breach/Stratfor`

Data sets that can only be referenced by a specific (often non-numeric) list of values are what the "Item List" discovery is for. To see it in action, set up in Cribl's rest collector like this:

  1. Start a new REST collector

  2. In the Collector ID field, enter pwnd_list.

  3. Expand the Discover accordion header, then from the Discover Type drop-down, select Item List.

  4. In the Discover items field, enter Adobe,Gawker,Stratfor and press your space bar. You will see the comma list convert into individual tags.

  5. Configure the Collect URL to reference the id value in the URL path.

    `https://haveibeenpwned.com/api/v3/breach/${id}`
  6. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

JSON Response

When your data set is a collection of records in a simple JSON structure, this is what you want. If you look at the output of the command:

`curl https://dummyjson.com/todos`

You see it's a simple list of records. No sub-elements, no arrays-inside-arrays.

JSON Response using Attribute

To explain how this is different from JSON Response, we have to do an example wrong first. Let's start with a different dataset from dummyjson.com:

`https://dummyjson.com/products/`

The results can be a bit jumbled, so let me show you a cleaner version:

{
"products": [
{
"id": 1,
"title": "Essence Mascara Lash Princess",
"description": "The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.",
"category": "beauty",
"price": 9.99,
"discountPercentage": 10.48,
"rating": 2.56,
"stock": 99,
"tags": [
"beauty",
"mascara"
],

***LOTS MORE DATA HERE***

"returnPolicy": "No return policy",
"minimumOrderQuantity": 48,
"meta": {
"createdAt": "2025-04-30T09:41:02.053Z",
"updatedAt": "2025-04-30T09:41:02.053Z",
"barcode": "5784719087687",
"qrCode": "https://cdn.dummyjson.com/public/qr-code.png"
},
"images": [
"https://cdn.dummyjson.com/product-images/beauty/essence-mascara-lash-princess/1.webp"
],
"thumbnail": "https://cdn.dummyjson.com/product-images/beauty/essence-mascara-lash-princess/thumbnail.webp"
},
{
"id": 2,
***AND SO ON***

As you can see, it's more complex than the other data sets we've seen, with nested arrays, multi-value fields, and more. We'll start by setting up the same as our original JSON response example:

  1. In the Collector ID field, enter complex_json.

  2. Expand the Discover accordion header, then from the Discover Type drop-down, select JSON Response.

  3. Copy and paste the following JSON into the Discover result box:

    [{"id":1},{"id":2},{"id":3}]
  4. Leave the Discover data field empty. We'll explore when to configure this in the next section.

  5. Configure the Collect URL to reference the id value in the URL path, just the same as we did in the last example.

    `https://dummyjson.com/products/${id}`
  6. At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Aaaand.... it doesn't work. Even though each record has "id: 1", "id: 2" and so on, it's just no bueno.

Cancel out of the preview modal, click the complex_json collector, and make the following edits: 7. Change the Discover Result to read:

`{ "products": [ { "id": 1 }, { "id": 2 }, { "id": 3 } ] }`
  1. In the Discover data field, type products.
  2. Click ► Save & Run, then Run again.

VOILA!! You have some sweet, sweet data.

The difference is that the source data starts with an array

"products": [
{
"id": 1,
***ETC ETC ETC***

...and therefore we need to tell Cribl's REST collector that each data record is inside that initial array.

Conclusion

Wooooo. That was a lot of nuanced knowledge we just dropped on you. Maybe take a minute to breath, stretch, hydrate, and process before moving ahead. Self-care is important, y'all.

When you're ready to come back, we're going to keep this pace up as we explore how to use responses from HTTP requests to dynamically discover what data you can collect from REST APIs!