Grok Extraction
Regular expressions are incredibly powerful. They're also an acquired taste, with a steep learning curve. Because of that, the Grok pattern matching system was created.
Grok is a templated regular expression matching engine, made popular in LogStash™. It provides a simple interface to powerful regex pattern matching and extraction. Cribl Stream supports using Grok patterns to extract data via the Grok
Function. We're going to use it to extract the filenames involved in our cron entries in our sample data.
Available Grok Patterns
First, let's take a look at the Grok patterns Stream supports.
- If you're not alreadly there, select
Manage
from Stream's top nav, then selectProcessing
>Knowledge
from the submenu.. - Click
Grok Patterns
in theKnowledge
page's left sidebar. - Click the
core-patterns
row to open its editor modal.
This will bring up the list of Grok patterns in this core-patterns
category. For each entry, the first word is the name of the pattern, and the rest of the line is the actual regex it evaluates to.
When we use the Grok
Function, we'll specify patterns in the following format: %{<pattern>:<field>}
. The Function will search for the <pattern>
specified in the source field, and if found, create a new field named <field>
with the results.
Take a minute to browse the available patterns; then click X
or press Esc
to close the editor modal. Additionally, click on other pattern files' rows to see the other Grok patterns that we ship with Stream. (You can also add your own patterns as needed.)
Let's see it in action. We'll use the sample data file named syslog_sample.log
, and we'll use the Grok
Function to extract filenames out of it.
- Cancel out of any
Edit Grok Patterns
modal you still have open. - Select
Processing
in Stream's top nav, then clickPipelines
. - Click to select the
extract_starter
Pipeline. This opens an empty Pipeline. - In the right
Sample Data
pane, next tosyslog_sample.log
, click theSimple
preview link. - At left, click
Add Function
, and search for and select theGrok
function (or select it from theStandard Functions
section). - In the
Pattern
field, enter the following:%{UNIXPATH:cron_command}
- Click
Save
.
This should extract the command being executed by cron, or at least the first thing that looks like a UNIX file path. This is great for the entries that call a script directly. But it's a little ugly in the case of the run-parts
events, because these use a compound shell commmand. So let's change it to extract the whole command, not just a UNIX path.
- Change the value of the
Pattern
field to:CMD \(%{GREEDYDATA:cron_command}\)
- Click
Save
.
Now, the cron_command
field in the right pane should show the entire command that CRON executed. In the Pattern
field at left, note how the parentheses before and after the pattern are escaped with \
– this is so that the Function does not include them in the results.