Pipelines: Sprinkle Lookups with Regex Magic
We’ve seen examples of using the magical powers of regex to customize Functions, extract fields, and filter events in real time. In this section, we’ll show you how to sprinkle your Lookups with regex magic. Let's walk through a Pipeline that demonstrates four different ways to leverage regular expressions in Cribl Stream.
Step 1: Extract the data with regex
When organizations use host-naming standards, it's easy to understand things like regions, Availability Zones (AZs), IP addresses, and more. For example, consider an Amazon host called:
ec2-35-162-133-145.us-west1-a.compute.amazonaws.com
This is an EC2 host with a (dashed) IP address 35-162-133-145
, in the us-west1
region, in Availability Zone a
. You can also see the domain:
compute.amazonaws.com.
While we can understand the enriched host names, we don't know which indexes to route the data to, nor which sourcetypes to assign to the events, without looking up this information from another source. Doing so is often a huge challenge for organizations. To solve this challenge, let's combine the Regex Extract, Lookup, and Eval Functions with some sample events to demonstrate the power of Cribl Stream.
- In the Stream UI's top nav, make sure Manage is active.
- From the submenu, select Processing > Pipelines.
- On the Pipelines page, find and click the
setting_index_by_region_availability_zone
Pipeline. (To display this first column's header and contents, you might need to drag the pane and column dividers toward the right.) - In the Pipeline's right pane, make sure Sample Data is selected.
- Click the Simple link at the lower right beside the
lookupsample.log
file. - At the end of any event's
_raw
field, click the Show more link to view all the fields in the event. - Click Add Function near the top of the left pane, and either find Regex Extract in the Standard submenu, or type Regex Extract into the search box to locate it. Then click it to add this Function to the Pipeline.
- Leave Filter at its default
true
value. - Enter a simple Description for the Function.
- In the Regex field, paste: the following:
GMT:\s+(?<host>[^.]+)\.(?<region>\w+-\w+\d+)-(?<az>[^.]+)\.(?<domain>[^:]+)
- Leave Source_Field at its default
_raw
value. - Click Save.
- Click the OUT button near the top of the right Preview pane to see the transformation of the data. The extracted fields
az
,domain
,host
, andregion
now appear below the_raw
event. You can use these extracted fields for searching in your preferred search solution.
Step 2: Assign an index and sourcetype using Lookups
We still need to determine the index
and sourcetype
. Cribl Stream's Lookup Function enriches events with external fields. We'll use it with the newly extracted region
field to assign an index
and sourcetype
to these events.
In the table below, five simple regular expressions map the extracted region field to the appropriate index
and sourcetype
. For example, the region us-west1-a
starts with us
, so it matches the first regular expression: us.+
We use this Lookup table's first row to assign an index of usa_index_tier
, and a sourcetype of cloud-init
, to each matching event. The region patterns in the table's four remaining rows work the same way.

- Still in the same
setting_index_by_region_availability_zone
Pipeline, click + Function near the top of the left pane, and either find Lookup in the Standard submenu, or type Lookup into the search box to locate it. Then click Lookup to add this Function to the Pipeline. - Leave Filter at its default
true
value. - Enter a simple Description for the Function.
- In the Lookup file path drop-down, select
region_index_sourcetype.csv
. - For the Source_Field, leave the default
_raw
. - For Match mode, select
Regex
. - For Match type, select
Most Specific
. - For Lookup field name in event, type
Region
.
Since we did not specify any Output fields, the Function will default to outputting all fields in the Lookup table. In our case we get the fields: index
and sourcetype
.
Step 3: Get the host IP address from Hostname
Since the IP address is present in the host field, we can create the host_ip
field using an Eval Function with this replace method:
host.replace(/\w+-(\d+)-(\d+)-(\d+)-(\d+)/,'$1.$2.$3.$4')
This regular expression uses capture groups, and pulls the four IP octets present in the hostname to build the host_ip
. These four capture groups are notated as $1.$2.$3.$4
, respectively. This method is very fast, and it removes the need to perform a DNS lookup from the host field to get the host's IP address. Need we say it? Magic!
- Still in the same
setting_index_by_region_availability_zone
, click + Function near the top of the left pane, and either find Eval in the Standard submenu, or type Eval into the search box to locate it. Then click Eval to add this Function to the Pipeline. - Leave Filter at its default
true
value. - Enter a simple Description for the Function.
- Click Add Field under Evaluate Fields, and then enter
host_ip
under Name. - For the Value Expression, paste the following:
host.replace(/\w+-(\d+)-(\d+)-(\d+)-(\d+)/,'$1.$2.$3.$4')
- Click Save.
With the Lookup Function added to our Pipeline, the Preview pane's OUT tab shows that the index
and sourcetype
are now added to each event.
Step 4: Customize the Sourcetype
Finally, let's put some sense into the sourcetype
field, using another Eval Function. By combining the values of the ${sourcetype}_${region}_${az}
, the sourcetype
becomes cloud-init_us-west1_a –
. Now you can understand much more about the sourcetype
at a glance.
Examine this Eval Function's value expression, taking careful note of the backticks (
) and braces () that surround the field names, and the underscore (_) that separates them.
- Still in the same
setting_index_by_region_availability_zone
Pipeline, click Add Function near the top of the left pane, and either find Eval in the Standard submenu, or type Eval into the search box to locate it. Then click Eval to add this Function to the Pipeline. - Leave Filter at its default
true
value. - Enter a simple Description for the Function.
- Click Add Field under Evaluate Fields, and then enter
host_ip
under Name. - For the Value Expression, paste the following:
${sourcetype}_${region}_${az}
- Click Save.
Take a look at the updated sourcetypes
in the Preview pane's OUT tab. Congratulations, you have accomplished quite a bit of complicated magic in this section, which sadly brings us to the end of our magical odyssey.