How Cribl Stream Collection Works
Cribl is the Data Engine for IT and Security, and Cribl Stream enables you to collect/pull data from a REST API endpoint without writing custom code. To configure a Collector, you need to simply enter an arbitrary, unique name and the URL to collect data from. Once you've configured a Collector, you can either run it immediately ("ad hoc") to collect data, or schedule it to run on a recurring interval.
When the Collector runs, Cribl Stream executes a collection job. A collection job consists of tasks, which are the units of work to authenticate, discover, or collect the data. For example, running a job that first authenticates and then collects data spawns two tasks.
To process collection jobs, Cribl Stream runs them in five phases:
- Authentication (optional on the Cribl Stream side)
- Discovery (optional)
- Collection
- Event Breaking (optional)
- Filtering (optional)
Although the same task processes steps 3 through 5, these are logically separate operations. For example, you can configure Cribl Stream custom Event Breakers to unroll JSON array data into individual events, extract fields, and automatically break lines using the Timestamp function.
You can also filter data based on criteria, such as time and/or field values, specified during the job's ad hoc or scheduled run.
Conclusion
You now understand the internals of how a collection job runs inside Cribl Stream.
In the next module, you'll start configuring a basic REST collector.