Regular Expression Extraction
For those of you who are undaunted by the complexity of regular expressions (regex), you can use regex to extract data – as we did with the Grok Function – by using Stream's Regex Extraction Function. Now we're going to do the same extraction we did in the Grok section, using a regular expression.
-
If you're not in the
extract_starterPipeline view from the last section, navigate to there (Manage>Processing>Pipelines>extract_starter). -
Slide the
OnToggle next to theGrokFunction to turn itOff. -
Click the
Add Functionbutton, and search for and select theRegex ExtractFunction (or select it from theStandardsection). -
Scroll down into the new Function. In its
Regexfield, enter the following:CMD\s+\((?<cron_command>[^\)]+)\)NOTE: If you copy/paste the above pattern, some browsers add an extra space character, which regex won't accept. If you see an error mark (red circle with an exclamation point) in the
Regexfield, make sure you strip any trailing spaces from the pattern. -
Click
Save.
This regular expression has a single capture group, which creates the cron_command field from the string of characters between the text CMD...( and the terminating ).
But what if we also wanted to extract the CRON process ID from the line for some reason? Simple:
-
Click the
Regex ExtractFunction'sAdd Regexbutton. -
In the new
Additional Regextable'sRegexfield, enter:CRON\[(?<cron_pid>\d+)\]NOTE: Same warning about extra space characters as above.
-
Click
Save.
In each event in the right Preview Simple pane, you should now see a new field, cron_pid, containing the process ID as extracted from the _raw field.