Enrich Data

Now, we're going to enrich events with information from a CSV file we have in Stream. This is the simplest form of enrichment, but Stream can also enrich against nearly any source, by using custom Functions.

First, we're going to examine our lookup file.

important

Examine Lookup

In the top nav, make sure Manage is active, then select the Processing submenu and click Knowledge.

Note: Depending on the size of your window, the top nav will consolidate items that won't fit in a pulldown represented by an ellipsis (...) - if so, click on the ellipsis and then select Processing and click on Knowledge.
Click Lookups on the left.
Click essential.csv to open its editor modal.
Beside Edit Mode, the Table should be selected.

You're now looking at a simple CSV file, displayed as a table. In this CSV, we have 2 columns. We're going to look up the group field in metrics.log against this CSV, returning an __essential field.

__essential is an internal field in Stream, which won't be sent to the downstream systems, but Stream expressions can access it the same as any other field. This allows us to store a bit of state in the event that's used only in Cribl. Later, we can easily filter down to only the pipeline and queue groups, by looking for __essential to be true.

Now that we've seen the lookup table's structure, let's navigate back to Pipelines.

important

Navigate back to Pipelines

Click X at the top right, or press Esc, to close the CSV editor modal.
Select the Processing submenu and click Pipelines.
Minimize the right Preview pane, as necessary, to expand the left Preview pane.
Click in the search bar (top left) and type splunk.
Click the splunk-metrics Pipeline.
Expand the right Preview pane

You can click the Expand button at the bottom right.
In the Preview pane's Samples list, next to metrics.log, click Simple.
Click the OUT tab near the Preview pane's top.

Now that we're back in the Pipeline editor, let's add our Lookup Function.

important

Add Lookup Function

Click Add Function.
Search for Lookup, and click it.
(You want Lookup, not DNS Lookup.)
In Filter, replace the default true value with: splunk_metrics.group.
(This tests that the group field is contained in the splunk_metrics object.)
Click in Lookup file path, and select the essential.csv table from the drop-down.
Under Lookup Field Name in Event, enter: splunk_metrics.group.
Under Corresponding Field Name in Lookup, enter: group.
Click Save.

Let's explain a bit of what we're doing here. The Lookup Function will match only if a field in an object called splunk_metrics is present, and contains a member of group. This should evaluate true for all the events we're concerned with.

Next, we selected a lookup file that was already installed in the Knowledge section. This Lookup Function will look inside that file, to match a given row if the value from an event matches the column value.

Lookup Field Name in Event tells Stream which event field to grab the value from. Corresponding Field Name in Lookup tells us what field to map to in the lookup file. With no Output field(s) currently configured, this Lookup Function will return all the rest of the columns from the lookup. (If you want to select a subset of columns, or to rename fields, add rows to the Output field(s) to do the mapping.)

Now we've enriched our events! But since our new field, __essential, is an internal field, we need to set Stream to show internal fields.

important

Show Internal Fields

In the right Preview pane, next to Select Fields, click the gear button.
In the resulting Advanced Settings drop-down, enable Show Internal Fields. (See below.)
Click the gear again to close the drop-down.

Internal Fields Toggle

In the right Preview pane, you should now see __essential set to true or false for every event. Events with a group of pipeline or queue will be set to true.

This technique is a great way to manage lists of items you want to selectively process, without having to declare a lot of individual Functions with Filter conditions. For this use case, let's drop events which are not __essential, and see how much our dataset is reduced.

important

Add Drop Function

Click Add Function.
Search for Drop, and click it.
Scroll down into the new Drop Function, and click into its Filter field.
Replace the Filter field's default true value with: !(__essential==='true')
Click Save.

In the right Preview pane, you should notice that many events have been dropped. You can click the Pipeline Diagnostics button next to Select Fields (it looks like a small bar chart) to see how many events have been dropped. Depending on your capture, it may be nearly 100%. It should look something like this:

Events Dropped

If you want to display only events that have not been filtered out, click the gear button next to Select Fields, and disable the Show Dropped Events slider.

Now that we've filtered down to just a subset of the data stream, let's install a Route to start outputting this modified data.