Skip to main content

Pipelines: Sprinkle Lookups with Regex Magic

We’ve seen examples of using the magical powers of regex to customize Functions, extract fields, and filter events in real time. In this section, we’ll show you how to sprinkle your Lookups with regex magic. Let's walk through a Pipeline that demonstrates four different ways to leverage regular expressions in Cribl Stream.

Step 1: Extract the data with regex

When organizations use host-naming standards, it's easy to understand things like regions, Availability Zones (AZs), IP addresses, and more. For example, consider an Amazon host called:

ec2-35-162-133-145.us-west1-a.compute.amazonaws.com

This is an EC2 host with a (dashed) IP address 35-162-133-145, in the us-west1 region, in Availability Zone a. You can also see the domain: compute.amazonaws.com.

While we can understand the enriched host names, we don't know which indexes to route the data to, nor which sourcetypes to assign to the events, without looking up this information from another source. Doing so is often a huge challenge for organizations. To solve this challenge, let's combine the Regex Extract, Lookup, and Eval Functions with some sample events to demonstrate the power of Cribl Stream.

important
  1. In the Stream UI's top nav, make sure Manage is active.
  2. From the submenu, select Processing > Pipelines.
  3. On the Pipelines page, find and click the setting_index_by_region_availability_zone Pipeline. (To display this first column's header and contents, you might need to drag the pane and column dividers toward the right.)
  4. In the Pipeline's right pane, make sure Sample Data is selected.
  5. Click the Simple link at the lower right beside the lookupsample.log file.
  6. At the end of any event's _raw field, click the Show more link to view all the fields in the event.
  7. Click Add Function near the top of the left pane, and either find Regex Extract in the Standard submenu, or type Regex Extract into the search box to locate it. Then click it to add this Function to the Pipeline.
  8. Leave Filter at its default true value.
  9. Enter a simple Description for the Function.
  10. In the Regex field, paste: the following:
    GMT:\s+(?<host>[^.]+)\.(?<region>\w+-\w+\d+)-(?<az>[^.]+)\.(?<domain>[^:]+)
  11. Leave Source_Field at its default _raw value.
  12. Click Save.
  13. Click the OUT button near the top of the right Preview pane to see the transformation of the data. The extracted fields az, domain, host, and region now appear below the _raw event. You can use these extracted fields for searching in your preferred search solution.

Step 2: Assign an index and sourcetype using Lookups

We still need to determine the index and sourcetype. Cribl Stream's Lookup Function enriches events with external fields. We'll use it with the newly extracted region field to assign an index and sourcetype to these events.

In the table below, five simple regular expressions map the extracted region field to the appropriate index and sourcetype. For example, the region us-west1-a starts with us, so it matches the first regular expression: us.+

We use this Lookup table's first row to assign an index of usa_index_tier, and a sourcetype of cloud-init, to each matching event. The region patterns in the table's four remaining rows work the same way.

important
  1. Still in the same setting_index_by_region_availability_zone Pipeline, click + Function near the top of the left pane, and either find Lookup in the Standard submenu, or type Lookup into the search box to locate it. Then click Lookup to add this Function to the Pipeline.
  2. Leave Filter at its default true value.
  3. Enter a simple Description for the Function.
  4. In the Lookup file path drop-down, select region_index_sourcetype.csv.
  5. For the Source_Field, leave the default _raw.
  6. For Match mode, select Regex.
  7. For Match type, select Most Specific.
  8. For Lookup field name in event, type Region.

Since we did not specify any Output fields, the Function will default to outputting all fields in the Lookup table. In our case we get the fields: index and sourcetype.

Step 3: Get the host IP address from Hostname

Since the IP address is present in the host field, we can create the host_ip field using an Eval Function with this replace method:

host.replace(/\w+-(\d+)-(\d+)-(\d+)-(\d+)/,'$1.$2.$3.$4')

This regular expression uses capture groups, and pulls the four IP octets present in the hostname to build the host_ip. These four capture groups are notated as $1.$2.$3.$4, respectively. This method is very fast, and it removes the need to perform a DNS lookup from the host field to get the host's IP address. Need we say it? Magic!

important
  1. Still in the same setting_index_by_region_availability_zone, click + Function near the top of the left pane, and either find Eval in the Standard submenu, or type Eval into the search box to locate it. Then click Eval to add this Function to the Pipeline.
  2. Leave Filter at its default true value.
  3. Enter a simple Description for the Function.
  4. Click Add Field under Evaluate Fields, and then enter host_ip under Name.
  5. For the Value Expression, paste the following:
 host.replace(/\w+-(\d+)-(\d+)-(\d+)-(\d+)/,'$1.$2.$3.$4')
  1. Click Save.

With the Lookup Function added to our Pipeline, the Preview pane's OUT tab shows that the index and sourcetype are now added to each event.

Step 4: Customize the Sourcetype

Finally, let's put some sense into the sourcetype field, using another Eval Function. By combining the values of the ${sourcetype}_${region}_${az}, the sourcetype becomes cloud-init_us-west1_a –. Now you can understand much more about the sourcetype at a glance.

Examine this Eval Function's value expression, taking careful note of the backticks ( ) and braces () that surround the field names, and the underscore (_) that separates them.

important
  1. Still in the same setting_index_by_region_availability_zone Pipeline, click Add Function near the top of the left pane, and either find Eval in the Standard submenu, or type Eval into the search box to locate it. Then click Eval to add this Function to the Pipeline.
  2. Leave Filter at its default true value.
  3. Enter a simple Description for the Function.
  4. Click Add Field under Evaluate Fields, and then enter host_ip under Name.
  5. For the Value Expression, paste the following:
 ${sourcetype}_${region}_${az}
  1. Click Save.

Take a look at the updated sourcetypes in the Preview pane's OUT tab. Congratulations, you have accomplished quite a bit of complicated magic in this section, which sadly brings us to the end of our magical odyssey.