Suppression
In this section, we'll cover how to use the Suppress Function to deduplicate data streams.
Suppression aims to deduplicate a stream. Suppression works well when a stream contains a lot of duplicative data. Examples are error storms, or status state as a log (repeated "still running!"–type messages). In our data stream, we're going to pick an arbitrary low-cardinality field to show how much reduction can be achieved.
note
Open business_event Pipeline
If you're already in the business_event Pipeline, you can skip this.
- Select the
Processingsubmenu, then clickPipelinesand click thebusiness_eventpipeline.
For this example, we want a few more events in our capture to show off these features. So we're going to run a 100-second capture.
important
Run 100-Second Capture
- In the right pane, make sure
Sample Datahas focus. - Click
Capture New. - In the
Capture Sample Datadialog, clickCapture. - For
Capture Time (sec), enter100. - For
Capture Up to N Events, enter100. - Click
Start. - Go grab coffee and come back in 100 seconds. (A blue status bar shows the capture's progress until it's complete.)
- When the capture has completed, click on
Save as Sample File, bringing up the sample file settings dialog. - In that dialog, set
File Nametobe_big.log. - At the bottom right, click
Save.
Now, let's add our Suppress Function. Suppress will emit Number to Allow events every Suppression Period seconds for each value returned by Key Expression.
Key Expression warrants some explanation. Like many areas in the product, we're giving you the full power of JavaScript here. Suppress will emit only Number to Allow events per Suppression Period (sec) for each unique value of Key Expression. Since it's an expression, we can combine multiple fields together, or manipulate fields, to determine uniqueness. For this example, we're going to pick a field (accountType) which has only two values in our dataset, to show how suppression works.
important
Add Suppress Function
- Make sure
Manage > Processing > Pipelinesis selected in the top nav, with thebusiness_eventPipeline displayed. - Click
+ Functionat the top, search forSuppress, and click it. - Scroll down and click into the new
SuppressFunction. - For
Filter, use the expressionsourcetype=='business_event'. - For
Key Expression, enteraccountType. - Click
Save.
Scroll through the right Preview pane to the right. You should see that most of the events have been dropped. Let's disable Show Dropped Events to clean up the list.
important
Disable Show Dropped Events
- At the top of the Preview pane, click the gear icon next to
Select Fields. - Toggle
Show Dropped EventstoOff.
As you scroll through this cleaned list, you should see two events every 30 seconds being emitted, one per accountType. If you click the Chart icon next to Select Fields, and look at the chart's rightmost column, you should see a ~92% reduction in event count.
If you leave Suppress running, you can also see changes in the real-time stats that Stream collects. Scroll down to later events, and you can see suppressCount set to the number of events we dropped for that accountType in that interval. With this information, you can estimate the amount of original data that would have been emitted.
Now click Quick Stats at the upper right, then click Outputs, and you should see output that looks like this:

By comparing the output byte counts to the Events IN above, you can see that suppression is drastically reducing the output volume for this dataset.
Before moving on, disable the Suppress Function.
important
Disable Suppress
- In the
SuppressFunction's header row, toggleOntoOff. - Click
Save.
Next, we're going to look at sampling.