Cribl Sandbox

Cribl Sandbox

    ›Control

    Intro

    • Cribl Stream Fundamentals
    • Sources & Destinations
    • Captures & Expressions

    Routes & Pipelines

    • Data Routes & Pipelines
    • Adding a Route

    Transform

    • Find & Replace
    • Parsing
    • Lookup

    Control

    • Suppression
    • Sampling
    • Aggregations

    Conclusion

    • Conclusion

    Suppression

    In this section, we'll cover how to use the Suppress Function to deduplicate data streams.

    Suppression aims to deduplicate a stream. Suppression works well when a stream contains a lot of duplicative data. Examples are error storms, or status state as a log (repeated "still running!"–type messages). In our data stream, we're going to pick an arbitrary low-cardinality field to show how much reduction can be achieved.

    note

    Open business_event Pipeline
    If you're already in the business_event Pipeline, you can skip this.

    1. Select the Processing submenu, then click Pipelines and click the business_event pipeline.

    For this example, we want a few more events in our capture to show off these features. So we're going to run a 100-second capture.

    important

    Run 100-Second Capture

    1. In the right pane, make sure Sample Data has focus.
    2. Click Capture New.
    3. In the Capture Sample Data dialog, click Capture.
    4. For Capture Time (sec), enter 100.
    5. For Capture Up to N Events, enter 100.
    6. Click Start.
    7. Go grab coffee and come back in 100 seconds. (A blue status bar shows the capture's progress until it's complete.)
    8. When the capture has completed, click on Save as Sample File, bringing up the sample file settings dialog.
    9. In that dialog, set File Name to be_big.log.
    10. At the bottom right, click Save.

    Now, let's add our Suppress Function. Suppress will emit Number to Allow events every Suppression Period seconds for each value returned by Key Expression.

    Key Expression warrants some explanation. Like many areas in the product, we're giving you the full power of JavaScript here. Suppress will emit only Number to Allow events per Suppression Period (sec) for each unique value of Key Expression. Since it's an expression, we can combine multiple fields together, or manipulate fields, to determine uniqueness. For this example, we're going to pick a field (accountType) which has only two values in our dataset, to show how suppression works.

    important

    Add Suppress Function

    1. Make sure Manage > Processing > Pipelines is selected in the top nav, with the business_event Pipeline displayed.
    2. Click + Function at the top, search for Suppress, and click it.
    3. Scroll down and click into the new Suppress Function.
    4. For Filter, use the expression sourcetype=='business_event'.
    5. For Key Expression, enter accountType.
    6. Click Save.

    Scroll through the right Preview pane to the right. You should see that most of the events have been dropped. Let's disable Show Dropped Events to clean up the list.

    important

    Disable Show Dropped Events

    1. At the top of the Preview pane, click the gear icon next to Select Fields.
    2. Toggle Show Dropped Events to Off.

    As you scroll through this cleaned list, you should see two events every 30 seconds being emitted, one per accountType. If you click the Chart icon next to Select Fields, and look at the chart's rightmost column, you should see a ~92% reduction in event count.

    If you leave Suppress running, you can also see changes in the real-time stats that Stream collects. Scroll down to later events, and you can see suppressCount set to the number of events we dropped for that accountType in that interval. With this information, you can estimate the amount of original data that would have been emitted.

    Now click Quick Stats at the upper right, then click Outputs, and you should see output that looks like this:

    Outputs

    By comparing the output byte counts to the Events IN above, you can see that suppression is drastically reducing the output volume for this dataset.

    Before moving on, disable the Suppress Function.

    important

    Disable Suppress

    1. In the Suppress Function's header row, toggle On to Off.
    2. Click Save.

    Next, we're going to look at sampling.

    ← LookupSampling →
    Docs
    Cribl Docs
    Community
    SlackCribl Content
    More
    Cribl WebsiteCribl Blog
    Copyright © 2023 Cribl, Inc.