The Cribl Breaker for NDJSON and Replay
In this first hands-on lab, we will cover how to correctly break events from data in an S3 bucket designed for a Cribl Stream archival use case. While this data has already been broken the first time, when you run a Collection job, an Event Breaker will be applied to the data again.
By default, all Sources have a built-in "fallback" Event Breaker. This default breaker will create events based on the regular expression: [\n\r]+(?!\s)
Simply put, this default Breaker considers a newline to be an event boundary, unless the preceding line starts with whitespace.
When archiving data from Cribl Stream to an S3 bucket, it is a best practice to use the JSON format. This writes all event metadata into the data stored inside S3, and formats the log entry as newline-delimited JSON (NDJSON).
This creates a small issue when collecting data from the bucket. Because the default Breaker uses the newline character as the separator, we receive events back from the Collection job with the original metadata still embedded in the _raw
field. This is not ideal. Let's inspect the problem, and then look at how to solve it.
Run the Collection Job
- From Stream's top nav, select Data > Sources.
- Click the S3 Collector tile.
- Click the ► Run button beside the configured
replay
S3 collector. - Click the blue Run button to start the Collection job.
Observe how each event contains the metadata inside the _raw
field. You can see this better by clicking the + button to the left of the _raw
field identifier.
The fields at each event's top level are from the configured path extractor. Notice how the source
field contains the same information as the other fields, such as src_ip
, dest_ip
, etc.
The Cribl Breaker
An easy solution exists to this problem: the preconfigured Cribl Breaker Ruleset! Let's configure the S3 Collector to use this Event Breaker to properly format events.
- If the Sample Data preview modal is open from the previous section, close it. (Click the X at the top right corner, or click the Cancel button at the bottom.)
- Click the configured
replay
Collector ID to open this Collector's configuration modal. - From the modal's left tabs, select Event Breakers.
- Observe that only the
System Default Rule
is currently configured for this collector. - Click the + Add Ruleset button.
- From the drop-down list, select the ruleset labeled
Cribl (Event breaking rules for Newline Delimited JSON data)
. - Click the ► Save & Run button at the modal's bottom left.
- Click the blue Run button.
Now observe how events have the proper formatting, based on the original event that was archived to the S3 bucket. When you run the Collection job, the events will be forwarded to the Destination with the correct metadata.
Conclusion
In this module, you've learned about the preconfigured Cribl Breaker Ruleset for processing events written to an S3 bucket. In the next module, you'll learn about working with JSON Arrays.