Skip to main content

Suppression

In this section, we'll cover how to use the Suppress Function to deduplicate data streams.

Suppression aims to deduplicate a stream. Suppression works well when a stream contains a lot of duplicative data. Examples are error storms, or status state as a log (repeated "still running!"–type messages). In our data stream, we're going to pick an arbitrary low-cardinality field to show how much reduction can be achieved.

Open business_event Pipeline

If you're already in the business_event Pipeline, you can skip this.

  1. Select the Processing submenu, then click Pipelines and click the business_event pipeline.

For this example, we want a few more events in our capture to show off these features. So we're going to run a 100-second capture.

Run 100-Second Capture
  1. In the right pane, make sure Sample Data has focus.
  2. Click Capture Data.
  3. In the Capture Sample Data dialog, click Capture.
  4. For Capture Time (sec), enter 100.
  5. For Capture Up to N Events, enter 100.
  6. Ensure Where to capture is set to 2. Before the Routes.
  7. Click Start.
  8. Go grab a coffee ☕ (or whatever sounds good) and come back in 100 seconds. (A blue status bar shows the capture's progress until it's complete.)
  9. When the capture has completed, click on Save as Sample File, bringing up the sample file settings dialog.
  10. In that dialog, set File Name to be_big.log.
  11. At the bottom right, click Save.

Now, let's add our Suppress Function. Suppress will emit Number to Allow events every Suppression Period seconds for each value returned by Key Expression.

Key Expression warrants some explanation. Like many areas in the product, we're giving you the full power of JavaScript here. Suppress will emit only Number to Allow events per Suppression Period (sec) for each unique value of Key Expression. Since it's an expression, we can combine multiple fields together, or manipulate fields, to determine uniqueness. For this example, we're going to pick a field (accountType) which has only two values in our dataset, to show how suppression works.

Add Suppress Function
  1. Make sure Manage > Processing > Pipelines is selected in the top nav, with the business_event Pipeline displayed.
  2. Click Add Function at the top, search for Suppress, and click it.
  3. Scroll down and click into the new Suppress Function.
  4. For Filter, use the expression:
    sourcetype=='business_event'
  5. For Key Expression, enter accountType.
  6. Click Save.

Scroll through the right Preview pane to the right. You should see that most of the events have been dropped. Let's disable Show Dropped Events to clean up the list.

Disable Show Dropped Events
  1. At the top of the Preview pane, click the gear icon () next to Select Fields.
  2. Toggle Show Dropped Events to Off.

As you scroll through this cleaned list, you should see two events every 30 seconds being emitted, one per accountType. If you click the Chart icon () next to the Select Fields dropdown, and look at the chart's rightmost column, you should see a ~92% reduction in event count.

If you leave Suppress running, you can also see changes in the real-time stats that Stream collects. Scroll down to later events, and you can see suppressCount set to the number of events we dropped for that accountType in that interval. With this information, you can estimate the amount of original data that would have been emitted.

Checking our work

If you click through to Monitoring > Data > Destinations, you'll notice that fewer events are now flowing out.

Outputs

By comparing the output byte counts to the Monitoring > Overview, you can see that suppression is drastically reducing the output volume for this dataset.

Before moving on, disable the Suppress Function.

Disable Suppress
  1. In the Suppress Function's header row, toggle On to Off.
  2. Click Save.

Next, we're going to look at sampling.