Skip to main content

Grok Extraction

Regular expressions are incredibly powerful. They're also an acquired taste, with a steep learning curve. Because of that, the Grok pattern matching system was created.

Grok is a templated regular expression matching engine, made popular in LogStash™. It provides a simple interface to powerful regex pattern matching and extraction. Cribl Stream supports using Grok patterns to extract data via the Grok Function. We're going to use it to extract the filenames involved in our cron entries in our sample data.

Available Grok Patterns

First, let's take a look at the Grok patterns Stream supports.

important
  1. If you're not alreadly there, select Manage from Stream's top nav, then select Processing > Knowledge from the submenu..
  2. Click Grok Patterns in the Knowledge page's left sidebar.
  3. Click the core-patterns row to open its editor modal.

This will bring up the list of Grok patterns in this core-patterns category. For each entry, the first word is the name of the pattern, and the rest of the line is the actual regex it evaluates to.

When we use the Grok Function, we'll specify patterns in the following format: %{<pattern>:<field>}. The Function will search for the <pattern> specified in the source field, and if found, create a new field named <field> with the results.

Take a minute to browse the available patterns; then click X or press Esc to close the editor modal. Additionally, click on other pattern files' rows to see the other Grok patterns that we ship with Stream. (You can also add your own patterns as needed.)

Let's see it in action. We'll use the sample data file named syslog_sample.log, and we'll use the Grok Function to extract filenames out of it.

important
  1. Cancel out of any Edit Grok Patterns modal you still have open.
  2. Select Processing in Stream's top nav, then click Pipelines.
  3. Click to select the extract_starter Pipeline. This opens an empty Pipeline.
  4. In the right Sample Data pane, next to syslog_sample.log, click the Simple preview link.
  5. At left, click Add Function, and search for and select the Grok function (or select it from the Standard Functions section).
  6. In the Pattern field, enter the following:
    %{UNIXPATH:cron_command}
  7. Click Save.

This should extract the command being executed by cron, or at least the first thing that looks like a UNIX file path. This is great for the entries that call a script directly. But it's a little ugly in the case of the run-parts events, because these use a compound shell commmand. So let's change it to extract the whole command, not just a UNIX path.

important
  1. Change the value of the Pattern field to:
    CMD \(%{GREEDYDATA:cron_command}\)
  2. Click Save.

Now, the cron_command field in the right pane should show the entire command that CRON executed. In the Pattern field at left, note how the parentheses before and after the pattern are escaped with \ – this is so that the Function does not include them in the results.