Skip to main content

The Summarize Operator

One of the most useful operators that Cribl Search provides is the summarize operator. Let's take another peek at our handy dandy notebook to see what it does (because there's no way you could just guess, right?).

read all about it
  1. Click the help icon help-icon to the left of the query box.
  2. In the pane on the right, click Aggregation Operators > summarize. Alternatively, you can simply type summarize in the provided search bar.

You guessed it. The summarize operator produces a table that aggregates the input data. Through the use of Cribl Functions and Statistical Functions, this operator allows you to perform calculations on numerical data and manipulate categorical data with ease.

Since we want to do more than just show our cool toys, let's hop in and play around with this operator a bit. If you still have the previous search populated, then good for you. If not:

Get it up
  1. Click search-home-icon Search Home in the left navigation bar.

  2. Click Sample Searches.

  3. Click the entry titled Summarize the count of records by the 'dataSource' field. The search should be:

    dataset="cribl_search_sample" | summarize count() by dataSource

Now let's alter the sample search. Instead, let's see if we can use this to craft a search that shows a status code breakdown using our Apache web server data. This data will be in the S3 bucket tied to our cribl_search_sample dataset.

summarize by status
  1. Change dataSource to status. The query should now be:
    dataset="cribl_search_sample" | summarize count() by status
  2. Click Search.
Learn by doing

Altering sample searches like this is a great way to get familiar with the Cribl Search Language.

Nice! Now we see that we have a chart showing all the status codes in our S3 bucket. However, there are quite a few null values that are getting the way. Let's filter so that we only have events that have the status field populated.

Return Only Populated Fields
  1. After dataset="cribl_search_sample" add a space then add status="*".

  2. Delete | summarize count() by status and add | limit 1000. The search should be:

    dataset="cribl_search_sample" status="*" | limit 1000
    Sample, Sample, Sample!!

    The sampling feature in Cribl Search is very useful, especially when using the limit operator. Sampling allows you to return events at a ratio specified by you, instead of 1:1. When you have datasets with lots of data in them, sampling is better for performance and allows you to get a more comprehensive reprentation of the data. When paired with the limit operator you can achieve this while still keeping the result set small and manageable.

  3. Click Sampling to the left of the time picker and select 1:100 from the dropdown

  4. Click Search.

Now we should be brought back to the Events tab.

important

Click status in the field browser to the left of the results.

We can see that 100% of the events have the status field now, but what if we have other data that has a status field that isn't from our web servers? Let's see if there is a way to get only the web server data.

Fields Tab

Click the Fields tab.

Take a look at the Uniques column and notice that the dataset and dataSource fields both only have a couple of values (both fields values represent the Apache access common and access combined data). Both of these identify our web server access logs and would be perfect for our search. For our use case, let's use the dataSource field to return all of the access_combined events.

Use the dataSource field
  1. Click the dropdown next to dataSource.
  2. Click access_combined in the Uniques column
  3. Click Add field in search. The search should now be:
    dataset="cribl_search_sample" status="*" dataSource="access_combined" | limit 1000
  4. Delete status="*" and | limit 1000.
  5. Re-add | summarize count() by status. The full search should now be:
    dataset="cribl_search_sample" dataSource="access_combined" | summarize count() by status
  6. Click the sampling dropdown and select the 1:1 sampling option to remove sampling.
  7. Click Search.

This looks good. The chart is much easier to analyze, but we can do even better.