Skip to main content

Hide Yo Data, Hide Yo PII, 'Cause They Hacking Everyone Up in Here

Cribl Guard introduces an intelligent, scalable solution for sensitive data detection. It enhances security by protecting critical information from unauthorized access and significantly reduces the risk of data breaches. At the same time, it increases operational efficiency by automating detection, and streamlining document and data handling workflows. You can read more about Cribl Guard in the official press release. I learn by doing. Let's do.

Setup

Prior to seeing Guard in action, let's set up an environment where it can really shine.

We need data. Lots of data.
  1. On the right-hand side, click Manage in the Stream section
  2. Click into the default Worker Group
  3. Up top, click Data > Sources
  4. Locate and click into the Datagen Source
  5. At the top right, click Add Source
  6. Fill out the fields as follows:
    • Input ID: business_event
    • Data Generator File Name: businessevent.log
  7. Commit & Deploy
First time?

If you are new to Cribl, welcome! Due to the nature of our environment (utilizing a Cribl.Cloud-hosted Leader), our changes are not put into effect until we Commit & Deploy them to the Workers int eh Worker Groups. Like this:

  1. In the top right, click Commit & Deploy
  2. In the resulting modal, write some meaningful message: The journey was the friends we made along the way
  3. Click Commit & Deploy at the bottom right of the modal

From now on, we'll just refer to this as Commit & Deploy. Thanks for coming to my Ted Talk™.

Pushing the Datagen out might take a minute or two. When it's ready, we'll collect a sample to use in the rest of the sandbox.

Collect a sample
  1. To the right of your business_event Source, click Live
Why are we doing this?

It can be pretty easy to just follow directions and think nothing of it. But this is actually kind of a cool feature and I want you to appreciate it. So let's take a quick look at the sample you just obtained. It can be tough to see because all the data is in the _raw field at the moment, but if you expand it by clicking Show More you'll see some Key=Value Pairs. Of note, literally, are the social and accountNumber fields. These are potentially PII and we most likely don't want them floating around in our logs / Destinations. Cribl Guard will help us with that.

  1. Once the sample finishes collecting (which should be rather quick since we only wait for 10 events and our Workers are sending 10 events per second), click Save as Sample File at the bottom right
  2. Change the filename to be.log
  3. Click Save

On Guard

We can sit and talk read all day, but as we learned in highschool english "show, don't tell". Let's go look at the fancy intro pages for Guard and then actually do some shit.

Navigate to Cribl Guard
  1. On the left nav, click Guard
  2. [Optional] (we said no more reading) Click through (and read) the three tabs illustrating Cribl Guard's capabilities
    • Guard your Destinations
    • Detect sensitive data with AI
    • Monitor your Pipelines
  3. Once finished, click Get Started

Now I wouldn't call this too "real world", but we can see Guard in action by enabling it on our devnull Destination, since our Datagen will be sending through to devnull. So let's do that.

Do that
  1. Click the radio button left of devnull to enable Guard.
  2. Commit & Deploy

We need to go deeper. We need to see it in action.

Go deeper
  1. To the right of devnull, click guard_devnull_pipe` to go into the automatically generated Cribl Guard pipeline
  2. On the right, click Simple to the right of be.log
  3. In the top left of the right-hand side, click OUT to see the results of the pipeline
  4. On the first event at the right, click Show More at the end of _raw

What the heck are we looking at? Good question. Let's start from the beginning. In the beginning God created the heavens and the earth. That Datagen we created runs of a sample business event (go figure) which has some sensitive information in it. If you look through the events you'll see fields titled socialor accountNumber. Cribl Guard automatically scans for Personally Identifiable Information (PII) and masks that data. Looking at those fields now, we can see that they have been replaced with REDACTED. Neat!

This is just the default behavior for Cribl Guard. We can go deeper! You'll notice that there is another field in this _raw called cardNumber which probably refers to credit card information. Definitely PII. Let's mask that.

Go deeper-er
  1. On the left, click the 1 in the circle to expand the Guard function
  2. Click Add Ruleset
  3. For Ruleset ID, select Finance_Global
  4. Click Save.

Well that was quick! We can already see that cardNumber is REDACTED. That's hot. But we can go deeper!

Go deeper-er-er
  1. Up top, click Processing > Knowledge
  2. On the left list click Guard Rules
  3. At the top right, click Add Rule > Add Rule with Copilot
  4. In the resulting chat window click Collect Sample Data
  5. Click Select and existing sample file
  6. Select be.log from the dropdown and click Confirm
  7. Click Describe the data you would like to mask
  8. At the bottom, type userName
  9. Check the automatic work and be amazed

I could walk you through putting that rule into your Guard Pipeline for your devnull Destination, but I think you can handle that 😘. Let's wrap up.