Grok Extraction
Regular expressions are incredibly powerful. They're also an acquired taste, with a steep learning curve. Because of that, the Grok pattern matching system was created.
Grok is a templated regular expression matching engine, made popular in LogStash™. It provides a simple interface to powerful regex pattern matching and extraction. Cribl Stream supports using Grok patterns to extract data via the Grok Function. We're going to use it to extract the filenames involved in our cron entries in our sample data.
Available Grok Patterns
First, let's take a look at the Grok patterns Stream supports.
- If you're not already there, select
Managefrom Stream's top nav, then selectProcessing>Knowledgefrom the submenu.. - Click
Grok Patternsin theKnowledgepage's left sidebar. - Click the
core-patternsrow to open its editor modal.
This will bring up the list of Grok patterns in this core-patterns category. For each entry, the first word is the name of the pattern, and the rest of the line is the actual regex it evaluates to.
When we use the Grok Function, we'll specify patterns in the following format: %{<pattern>:<field>}. The Function will search for the <pattern> specified in the source field, and if found, create a new field named <field> with the results.
Take a minute to browse the available patterns; then click X or press Esc to close the editor modal. Additionally, click on other pattern files' rows to see the other Grok patterns that we ship with Stream. (You can also add your own patterns as needed.)
Let's see it in action. We'll use the sample data file named syslog_sample.log, and we'll use the Grok Function to extract filenames out of it.
- Cancel out of any
Edit Grok Patternsmodal you still have open. - Select
Processingin Stream's top nav, then clickPipelines. - Click to select the
extract_starterPipeline. This opens an empty Pipeline. - In the right
Sample Datapane, next tosyslog_sample.log, click theSimplepreview link. - At left, click
Add Function, and search for and select theGrokfunction (or select it from theStandard Functionssection). - In the
Patternfield, enter the following:%{UNIXPATH:cron_command} - Click
Save.
This should extract the command being executed by cron, or at least the first thing that looks like a UNIX file path. This is great for the entries that call a script directly. But it's a little ugly in the case of the run-parts events, because these use a compound shell commmand. So let's change it to extract the whole command, not just a UNIX path.
- Change the value of the
Patternfield to:CMD \(%{GREEDYDATA:cron_command}\) - Click
Save.
Now, the cron_command field in the right pane should show the entire command that CRON executed. In the Pattern field at left, note how the parentheses before and after the pattern are escaped with \ – this is so that the Function does not include them in the results.