The Concealment Charm
In the wizarding world, we all know that some things must remain hidden. Even muggles appreciate the fact that some data logged into events should not be visible. Personally Identifiable Information (PII) is particularly sensitive, and must be masked. Failure to mask PII data could trigger security audit failures in the muggle world, or create chaos in the magical realms. In this section, we’ll introduce Cribl’s Mask Function by creating a Pipeline to mask potential Social Security numbers and credit card numbers.
- In the Stream UI's top nav, make sure Manage is active.
- From the submenu, select Processing > Pipelines.
- On the Pipelines page, find and click the
Beginner_Mask_PII
Pipeline. (To display this first column's header and contents, you might need to drag the pane and column dividers toward the right.) - In Pipeline's right pane, make sure Sample Data is selected.
- Click the Simple link at the lower right beside the
mask_pii.log.json
sample file. You're now previewing this sample log's events in the right Preview pane. - At the end of any event's
_raw
field, click the Show more link to view all the event's fields. Notice that the nestedsocial
andcardNumber
fields are unmasked. - Click Add Function near the top of the left pane, and either find Mask in the Standard submenu, or type Mask into the search box to locate it. Then click Mask to add this Function to the Pipeline.
- Leave the new Mask Function's Filter field at its default
true
value. - Enter a simple Description for the Function (something that will identify its purpose, like
Mask PII
). - In the Masking Rules table, click the pencil (Edit) icon to the right of the Replace Expression column to open the Masking Rules modal.
Now, let's compose our Match Regex expression and corresponding Replace Expression. This modal's editor makes it easy for you to do this interactively, against real data, to gain confidence that your Match Regex and Replace Expression will work properly with your data.
- In the upper-left Match Regex pane, between the two
/
delimiters, paste:(social=)(\d+)
Regex allows you to extract information for further processing. You can define a group of characters and capture them, using parentheses. Any sub-pattern inside a pair of parentheses will be captured as a group and can be recalled later, hence the name capture group. In practice, you can use the parentheses to extract information like Social Security numbers, emails, or phone numbers from all sorts of data.
In our example, we have a regular expression that looks for digits (\d+)
following (social=)
.
The \d
matches a digit. This is equivalent to the regex expression [0-9].
+
matches the previous token between one and unlimited times. In our case, it will match as many digits as are displayed in the social
field, until it hits a non-digit character.
After pasting this in, you'll notice where the regular expression matches in the main event body, and highlights both matching capture groups in the Match Regex field.
- Still in the Masking Rules dialog, paste this into the Replace Expression pane:
This regex applies an MD5 hash to matched strings.
`${g1}${C.Mask.md5(g2)}`
- In the lower Output pane, notice what our Replace Expression has accomplished: The purple highlight shows that the
social
field's value is now hashed and anonymized. - Click OK to close the Regex validation modal. This inserts your validated regex expressions into the Masking Rules table.
- Click Save to save the Mask Function to your Pipeline.
Let's unpack the above Replace Expression:
As stated earlier, when we enclosed social=
in parentheses, we created a capture group to extract the matching pattern. In the Cribl Mask Function, we can reference the matching group in the Replace Expression as g1
, g2
... gN
, and the entire match as g0
.
In our example, social=
is assigned to capture group g1
, and (\d+)
is g2
for later reference.
The value after social=
will be hashed in the md5
function. In our example, we reference the value as g2
. This is a clean way to specify which portion of the Social Security number to mask.
If we didn't make social=
its own capture group, we couldn't reference it using g1
in the Replace Expression. The value of social=
would instead be assigned to g1
, and the entire social=#########
string would be replaced with a hash of the Social Security number. This probably isn't desired, because no one would know what value was being hashed without a field name preceding it.
In addition to C.Mask.md5
, Cribl Stream offers several other helper functions for masking. E.g., there's C.Mask.REDACTED
, which throws an invisibility cloak over sensitive values; and C.Mask.CC
, which we'll use now to mask credit card numbers.
- Still in the
Beginner_Mask_PII
Pipeline's Mask Function, click + Add Rule to add a second row to the Masking Rules table. - In the new row's Match Regex column, between the two
/
delimiters, paste:(cardNumber=)(\d+)
- Copy and paste the following into the right Replace Expression column:
`${C.Mask.CC(g0)}`
- Click Save to save the Mask Function to your Pipeline.
- In the right Preview pane, at the end of any event's
_raw
field, click the Show more link to display all nested fields. - Find a nested
cardNumber
field to see the transformation that our regex has achieved: All except the last four digits should now be masked withX
characters.
Let's unpack the above Replace Expression:
To mask the credit card number, we are using the helper function C.Mask.CC (method)
.
This helper function checks whether a value could be a valid credit card number by computing the string's Lunh's checksum modulo 10 == 0
. It then masks a subset of that value. By default, it replaces all digits except the last four with X
.
C.Mask.CC
accepts the following parameters: Mask.CC(value: string, unmasked?: number, maskChar?: string): string
. These parameters are:
value
– A string whose digits to mask, if (and only if) it could be a valid credit card number.unmasked
– How many digits to leave unmasked. Specify positive values for left digits, negative values for right digits, or0
to leave none unmasked.maskChar
– A character or string to replace digits with.
For details, see our excellent docs for details on Data Masking Functions.