Skip to main content

Regular Expression Extraction

For those of you who are undaunted by the complexity of regular expressions (regex), you can use regex to extract data – as we did with the Grok Function – by using Stream's Regex Extraction Function. Now we're going to do the same extraction we did in the Grok section, using a regular expression.

important
  1. If you're not in the extract_starter Pipeline view from the last section, navigate to there (Manage > Processing > Pipelines > extract_starter).

  2. Slide the On Toggle next to the Grok Function to turn it Off.

  3. Click the Add Function button, and search for and select the Regex Extract Function (or select it from the Standard section).

  4. Scroll down into the new Function. In its Regex field, enter the following:

    CMD\s+\((?<cron_command>[^\)]+)\)

    NOTE: If you copy/paste the above pattern, some browsers add an extra space character, which regex won't accept. If you see an error mark (red circle with an exclamation point) in the Regex field, make sure you strip any trailing spaces from the pattern.

  5. Click Save.

This regular expression has a single capture group, which creates the cron_command field from the string of characters between the text CMD...( and the terminating ).

But what if we also wanted to extract the CRON process ID from the line for some reason? Simple:

important
  1. Click the Regex Extract Function's Add Regex button.

  2. In the new Additional Regex table's Regex field, enter:

    CRON\[(?<cron_pid>\d+)\]

    NOTE: Same warning about extra space characters as above.

  3. Click Save.

In each event in the right Preview Simple pane, you should now see a new field, cron_pid, containing the process ID as extracted from the _raw field.