Skip to main content

Find & Replace

In addition to simply routing data, Cribl Stream can perform many different types of transformations on the data. There are many use cases, including:

  • Securing the data's contents, by removing or encrypting sensitive information.
  • Reducing cost, by eliminating unwanted fields even inside nested data structures.
  • Aggregating data, to convert many different events to a much smaller stream of metric data.

In this section, we will use a Mask Function to do an md5 hash of a sensitive piece of information in our data. We will use Data Routes, such that only data sent to tcpjson will be modified, and the original data sent to fs will remain unmodified.

Adding a Pipeline

First, we need to add a new Pipeline. Since our sourcetype is business_event, let's name our Pipeline the same, so we'll know what type of data it was designed to work with.

Add a Pipeline
  1. Select the Processing submenu and click Pipelines.
  2. Click Add Pipeline, then click Create Pipeline.
  3. In ID, enter business_event.
  4. Click Save.

You'll now see an empty Pipeline with no Functions. Before adding Functions, let's grab a capture to get some sample events to work with.

Save a Capture
  1. In the right pane, make sure Sample Data has focus.
  2. Then click Capture Data.
  3. In Filter Expression, paste the following to replace the default entry:
    sourcetype=='business_event'
    This filter expression will bring back only events that have a field called sourcetype whose value is business_event. (You might recognize this field name from popular SIEM schema.)
  4. Click Capture and then Start.
  5. When the capture has completed, click on Save as Sample File, bringing up the sample file settings dialog.
  6. In that dialog, set File Name to be.log.
  7. At the bottom right, click Save.

Mask Function

If we look at the data in the right pane, we can see that the _raw field contains a number of Key=Value pairs. (Click the Show more links to expand the full _raw fields.) Look carefully, and you'll see pairs of the form social=123456789, which contain un-redacted social security numbers. We're going to use our Mask Function to run a md5 hash of these numbers and then replace the original value with the hashed value.

Add a Mask Function
  1. On the left Processing > Pipelines page, click Add Function.
  2. Search for Mask.
  3. Click Mask.
  4. Replace the default Filter value with:
    sourcetype=='business_event'
  5. Click the Expand icon () at the right edge of the Filter field.
  6. In the resulting Filter modal's Output field, verify that your expression evaluates to true.
  7. In the Filter field, remove the t from the end of 'business_event'.
  8. Verify that the Output field now evaluates the expression as false.
  9. Put the t back.
  10. Click OK to close the Filter modal.
  11. In the Masking Rules table's first row, click the pencil () icon to the right of the Replace Expression column. This opens the Masking Rules modal.

It's a good practice to scope every Function's Filter expression to just the data you want the Function to operate on. This way, if data accidentally gets sent down the Pipeline, each Function will work only on data it's expecting to work on. In general, defense against misconfiguration is best done in depth.

Now, let's compose our match regular expression and replacement expression. Our Masking Rules editor makes it easy for you to do this interactively, against real data, to gain confidence that your Match Regex and Replace Expression are going to work properly with your data.

Add Your Regex
  1. Paste the following into the dialog's upper-left Match Regex field, between the two / delimiters.
    (social=)(\d+)

This is a simple regular expression that looks for digits following social=. After pasting this in, you'll notice that we find where the regular expression matches in the main event body, and we highlight both matching capturing groups in the Match Regex field.

(Explaining regular expressions is outside the scope of this tutorial, but there are numerous resources online for learning regex. Regular expressions can be a very valuable tool in the toolchest of a machine data engineer.)

Replacement Expression

Input Replacement Expression

This section is teaching about typeahead and the expression editor. We recommend that you follow along interactively, but if you're in a hurry, you can skip the rest of this section by copying and pasting the following into the upper-middle Replace Expression box in the Mask Editor.

`${g1}${C.Mask.md5(g2)}`

Next, we want to create a replacement expression. Replacement expressions are also JavaScript expressions, like those you've already seen several times in this tutorial. The replacement expression can be any type of JavaScript expression, but we also make the regex capture groups available as variables in the replacement expression.

important

Type ` (backtick), which the IDE will automatically expand as two backticks.

You're now entering a JavaScript Template Literal, which is a little templating language built into JavaScript. The syntax for referencing variables is very bash-like, and from the above expression, you can see a couple of variables.

important

Type ${g1}.

In the bottom portion of the Mask editor, notice how social= appears where social=123456789 was in the original event. We now have a JavaScript template literal which is taking the value from the first capture group – which you can see at the upper right is matched as social= – and placing it back.

Next, we're going to call a helper Function to modify the value of capture group 2.

important

Add ${C.Mask.} and stop.

As you type, you'll see that the UI gives you typeahead suggestions to help you complete the expression. In this case, you can see there are a number of Masking Functions available to you. You're free to experiment with other options, but in this case, I chose md5.

important

Add md5(g2)

The full second part of the expression is ${C.Mask.md5(g2)}. This is using an md5 hashing Function on the value of capture group 2 (g2), which obfuscates this Social Security number for us.

Save the Pipeline

Now, validate in the bottom section that the Mask Function is properly hashing the Social Security number. Your editor should look like this:

Mask

Once you're satisfied, save the Pipeline, and we'll look at our data in Preview.

Save the Pipeline
  1. Click OK to close the Masking Rules editor modal.
  2. The regex we just validated should now be in the Function's Match Regex field. If not, paste it in:
    (social=)(\d+)
  3. Click Save to save the Mask Function to your Pipeline.

Now, in the right Preview pane, you should see the modified events. Orange fields represent a field that has been modified by this Pipeline. This preview is being generated offline, using our static sample file. We are not yet modifying production events, as our terminal window should show below.

Installing Our Pipeline

Now, it's time to take our configured Pipeline and start sending events through it.

Add Route
  1. Select the Routing submenu and click on Data Routes.
  2. At the default Route's right side, click the ... menu.
  3. From the menu, click Insert Route Above.
  4. In the new Route's (#2) Route Name field, enter business_event.
  5. In Filter, replace true with:
    sourcetype=='business_event' 
  6. In the Pipeline drop-down, select business_event.
  7. In the Output drop-down, select tcpjson:tcpjson.
  8. Click Save.

Your new business_event Route should look like this:

Route

In the terminal, you should now see events coming through tcpjson, with the social field hashed. With our current setup, unmodified events are being sent to the fs Destination, perhaps as an archive of the original data. You can validate this with the following two commands. The first shows new data being appended to the filesystem; the second shows modified data coming out through tcpjson.

Look at Data in Terminal

Optionally, validate that data is correctly shaped in the terminal.

  1. Type ^C or (Ctrl-C) to stop any running commands.
  2. In the terminal, run tail on the fs output to verify social is unmodified here:
    tail -qf /tmp/staging/*/*/*/*/*/CriblOut*.json.tmp | grep -E 'social=[^, ]+' --color=always
  3. Type ^C or (Ctrl-C) to restore a command prompt.
  4. Validate that social is modified on tcpjson:
    tail -f /tmp/nc.log | grep -E 'social=[^, ]+' --color=always
  5. Note: Keep this command running as you'll continue viewing the output in the next section.

Next, we're going to use Stream to parse our input, lifting the nested Key=Value structure to the top level of the event. This will allow the data to be more easily consumed in other systems.