The Summarize Operator
One of the most useful operators that Cribl Search provides is the summarize
operator. Let's take another peek at our handy dandy notebook to see what it does (because there's no way you could just guess, right?).
- Click the help icon
to the left of the query box.
- In the pane on the right, click
Aggregation Operators
>summarize
. Alternatively, you can simply typesummarize
in the provided search bar.
You guessed it. The summarize
operator produces a table that aggregates the input data. Through the use of Cribl Functions and Statistical Functions, this operator allows you to perform calculations on numerical data and manipulate categorical data with ease.
Since we want to do more than just show our cool toys, let's hop in and play around with this operator a bit. If you still have the previous search populated, then good for you. If not:
-
Click
Search Home
in the left navigation bar. -
Click
Sample Searches
. -
Click the entry titled
Summarize the count of records by the 'dataSource' field
. The search should be:dataset="cribl_search_sample" | summarize count() by dataSource
Now let's alter the sample search. Instead, let's see if we can use this to craft a search that shows a status code breakdown using our Apache web server data. This data will be in the S3 bucket tied to our cribl_search_sample
dataset.
- Change
dataSource
tostatus
. The query should now be:dataset="cribl_search_sample" | summarize count() by status
- Click
Search
.
Altering sample searches like this is a great way to get familiar with the Cribl Search Language.
Nice! Now we see that we have a chart showing all the status codes in our S3 bucket. However, there are quite a few null values that are getting the way. Let's filter so that we only have events that have the status field populated.
-
After
dataset="cribl_search_sample"
add aspace
then addstatus="*"
. -
Delete
| summarize count() by status
and add| limit 1000
. The search should be:dataset="cribl_search_sample" status="*" | limit 1000
Sample, Sample, Sample!!The
sampling
feature in Cribl Search is very useful, especially when using thelimit
operator.Sampling
allows you to return events at a ratio specified by you, instead of 1:1. When you have datasets with lots of data in them,sampling
is better for performance and allows you to get a more comprehensive reprentation of the data. When paired with thelimit
operator you can achieve this while still keeping the result set small and manageable. -
Click
Sampling
to the left of the time picker and select1:100
from the dropdown -
Click
Search
.
Now we should be brought back to the Events
tab.
Click status
in the field browser
to the left of the results.
We can see that 100% of the events have the status
field now, but what if we have other data that has a status
field that isn't from our web servers? Let's see if there is a way to get only the web server data.
Click the Fields
tab.
Take a look at the Uniques column and notice that the dataset
and dataSource
fields both only have a couple of values (both fields values represent the Apache access common
and access combined
data). Both of these identify our web server access logs and would be perfect for our search. For our use case, let's use the dataSource
field to return all of the access_combined
events.
dataSource
field- Click the dropdown next to
dataSource
. - Click
access_combined
in theUniques
column - Click
Add field in search
. The search should now be:dataset="cribl_search_sample" status="*" dataSource="access_combined" | limit 1000
- Delete
status="*"
and| limit 1000
. - Re-add
| summarize count() by status
. The full search should now be:dataset="cribl_search_sample" dataSource="access_combined" | summarize count() by status
- Click the
sampling
dropdown and select the1:1
sampling option to remove sampling. - Click
Search
.
This looks good. The chart is much easier to analyze, but we can do even better.