Dataset Processing
Now that we have seen HOW we are connecting to the Amazon S3
object store, we should go back to the cribl_search_sample
dataset to review WHAT data we are searching (or expecting to search) as a part of the dataset
.
- Click
X
on theDataset Provider
modal. - Click
Datasets
in the left navigation bar. - Click
cribl_search_sample
. - Click
Processing
in the left navigation bar.
Datatypes
Here we can see all the Datatypes
that are configured for the cribl_search_sample
dataset
. More specifically, these are the rulesets
detailing how the different types of data within the Amazon S3
bucket should be broken into events, timestamped, and parsed. A dataset
can be associated with one or more Rulesets
. Rulesets
are evaluated top‑down and consist of an ordered list of rules
which are also evaluated top‑down. Any data that is not captured by one of the datatypes
configured here will be procesed using the System Default Rule
represented at the bottom of the list.
It is beneficial to put the rulesets
that match most of the data first to speed up processing.
Because there are lots of different types of data that can be searched within an Amazon S3
bucket (or within any dataset provider
), it is useful to note that datatypes
are not explicitly tied to any dataset provider
or dataset
.
Ex. AWS Datatypes
could be applied to a dataset
that is searching events from an Azure Blobs dataset provider
.
With respect to datasets
, datatypes
can have a one-to-one, one-to-many, many-to-one, or many-to-many relationship.
Datatype Configurations
Alright Alice, let's see how far the rabbit hole goes and take a closer look at the rules behind the datatypes
listed here.
- Click
Settings
in the top navigation bar. - Ensure
Datatypes
is highlighted in the left navigation bar.
This is where datatypes
are created and configured. If you look with your special eyes you'll notice that in the Library
column you can see which datatypes are custom-made and which are shipped with Cribl Search
. If you can recall, AWS Datatypes
was the first datatype
listed for processing in our cribl_search_sample
dataset
, so it stands to reason we expect most of the data returned to match the rules represented there.
Click AWS Datatypes
.
In the settings for this datatype
you'll once again find an immutable ID
field, followed by a description and a Tags
input.
Tags
are an optional mechanism for organizing configs. Below that you'll find the rules
.
Rules are Rules
Rules
ultimately determine what datatype is applied to an event. Cribl Search
will attempt to match the data against all the ordered rules, and the first rule that matches is used to process the search results.
The name of the rule is captured in the Name
field. Following the name is the Filter
field which contains a javascript filter expression that the event is matched against. If the event matches the Filter
then the event is given the datatype that is defined in the Datatype
field. Unlike datasets
, and dataset providers
, which can both be referred to by their ID
s when searching, datatypes
are referred to by what is populated in the Datatype
field when searching. Not by the ID of the Datatype or name of the rule.
Ex.
dataset="web_logs" datatype="aws_vpcflow"
The Event Breaker
section configures how Cribl Search converts data into discrete events. Cribl Search provides several different formats for event breakers
.
The process of datatyping is as follows:
Event Breaking
– Breaks raw bytes into discrete events.Timestamping
– Assigns timestamps to events.Parsing
– Parses fields from events.Add Fields
– Adds additional fields.