Regular Expression Extraction
For those of you who are undaunted by the complexity of regular expressions (regex), you can use regex to extract data – as we did with the Grok
Function – by using Stream's Regex Extraction
Function. Now we're going to do the same extraction we did in the Grok section, using a regular expression.
-
If you're not in the
extract_starter
Pipeline view from the last section, navigate to there (Manage
>Processing
>Pipelines
>extract_starter
). -
Slide the
On
Toggle next to theGrok
Function to turn itOff
. -
Click the
Add Function
button, and search for and select theRegex Extract
Function (or select it from theStandard
section). -
Scroll down into the new Function. In its
Regex
field, enter the following:CMD\s+\((?<cron_command>[^\)]+)\)
NOTE: If you copy/paste the above pattern, some browsers add an extra space character, which regex won't accept. If you see an error mark (red circle with an exclamation point) in the
Regex
field, make sure you strip any trailing spaces from the pattern. -
Click
Save
.
This regular expression has a single capture group, which creates the cron_command
field from the string of characters between the text CMD...(
and the terminating )
.
But what if we also wanted to extract the CRON process ID from the line for some reason? Simple:
-
Click the
Regex Extract
Function'sAdd Regex
button. -
In the new
Additional Regex
table'sRegex
field, enter:CRON\[(?<cron_pid>\d+)\]
NOTE: Same warning about extra space characters as above.
-
Click
Save
.
In each event in the right Preview Simple
pane, you should now see a new field, cron_pid
, containing the process ID as extracted from the _raw
field.