Configuring Scheduled Collection Jobs

In this section, you'll explore how to schedule jobs that collect data from a REST API endpoint. You can schedule jobs to recur on an interval that you define.

The `earliest` and `latest` Parameters

note

Cribl Stream uses two "magic" variables, earliest and latest, to specify a time range when collecting data. For additional details on configuring time ranges, take a moment to review this related documentation before you continue: Collector Sources > Scheduling and Running > Time Range.

To see how the earliest and latest variables work, let's configure a Collector that uses these parameters in a collection job.

important

If necessary, navigate to the REST Collector Source page. From the top nav of your Cribl Stream Sandbox, select Manage > Data > Sources, then select Collectors > REST from the Data Sources page's tiles or left nav.

Click Add Collector to open the REST > Add Collector modal.
In the Collector ID field, enter echo.
Copy/paste the following URL into the Collect URL field.
```
'http://rest-server/echo'
```
Configure two Collect parameters by clicking the + Add Parameter button twice. Copy/paste the parameters' settings from the table below.

Name Value
earliest `${earliest}`
latest `${latest}`
At the bottom left, click ► Save & Run. In the Run configuration modal, click Run again.

Name	Value
earliest	`${earliest}`
latest	`${latest}`

Observe the output from the REST API server. You'll see information related to the headers, body, and query string parameters returned to you.

{"headers":{"host":"rest-server","connection":"close"},"body":{},"query":{}}

Why are there no references to earliest or latest in the query section? Because we didn't specify an absolute or relative time range when running the Collector. These variables' values are undefined, meaning they are ignored during the collection job.

Now we'll configure Cribl Stream to send the earliest and latest parameters.

important

If open, close the Preview modal from the previous step.
Click the ► Run button on the echo Collector row.
In the Earliest field, enter -5m@m. This means Cribl Stream will snap to :00 seconds, 5 minutes ago.
In the Latest field, enter @m. This means Cribl Stream will snap to the last minute at :00 seconds.
Click the Run button.

Observe that the output from the REST API now includes the earliest and latest parameters, in UNIX epoch time format (seconds granularity). There should be 5 minutes' difference between the earliest and latest timestamps, and both should be snapped to :00 seconds.

You can translate the timestamps to a human-readable date and time by running the following command in your terminal (replace the placeholder timestamp with your result):

date -d @1654529520

note

If you need to set a default time range when the Collection runs, you can use the JavaScript Logical OR (||) operator to set a default value.

For example, if your earliest field value for this schedule will always be 5 minutes, you can use this syntax:

earliest || new Date().setTime(new Date().getTime() - (new Date().getTime() % (5 * 60 * 1000))) / 1000

If you need to format the time into a string, you can use the C.Time.strftime function.

Scheduling

Now, let's configure the echo job to run on a schedule. The goal is to collect a 60-second snapshot of data every 60 seconds.

important

If open, close the Preview modal from the previous step.
Click the ⏱ Scheduled button on the echo Collector row.
Set Enabled to Yes.
Change the Cron schedule to * * * * * (meaning every minute).
Set Skippable to No.
Set Resume missed runs to Yes. (This setting appears after you disable Skippable.)
In the Earliest field, enter -1m@m. This means Cribl Stream will snap to :00 seconds, 1 minutes ago.
In the Latest field, enter @m. This means Cribl Stream will snap to the last minute at :00 seconds.
Click the Save button.

Your Schedule Collector window should look like the following:

Scheduling

In this Sandbox instance, we automatically apply all configuration changes when you save them. But you are running a Crib.Cloud or distributed deployment of Cribl Stream, you must next Commit and Deploy for your changes to take effect.

note

Why Disable Skippable, and Enable Resume Missed Runs?

These settings are important for reliable data collection with any Collector!

Cribl Stream places concurrency limits on its number of running jobs and tasks. This is to ensure that system resources are not depleted during runtime. When Sources like Office 365, and Collectors like S3 run concurrent jobs, they can exceed concurrency limits – and Cribl Stream might then skip a REST Collector job. With Skippable disabled, if Cribl Stream reaches concurrency limits, it will queue the job run until the next available start time.

The Resume missed runs setting is important when the Leader Node restarts or is unavailable. If you enable this setting, Cribl Stream tracks the last successful run time for each job. Upon restart, it will automatically schedule any skipped collection jobs.

Read more about Job Limits on the Cribl Docs site.

Conclusion

Congratulations, you now know how to schedule collection jobs! In the next module, we'll explore how to troubleshoot – using logs – when a REST collector is not working correctly.

The earliest and latest Parameters​

Scheduling​

Conclusion​

The `earliest` and `latest` Parameters

Scheduling

Conclusion