Skip to main content

Introduction to Data

Welcome back padawan. If you are to become the true data jedi that you were meant to be, continue your training we must. However, before we can move forward we must first go back to the very beginning and learn how to establish a connection with The Force the data for yourself.

🎵 Data, Data, Data 🎵​

If you have already gone through the Cribl Search Overview sandbox (highly recommended), then you would have heard us mention data providers, datasets, and a myriad of other data related terms, but what are they?

Dataset Providers tell Cribl Search where to query and contains access credentials for connecting to that provider.

Datset Provider Types tell Cribl what kind of dataset provider is being configured. This determines what configuration details are needed to establish a connection.

Datasets tell Cribl Search what data to search from the dataset provider.

Datatypes help Search break data from datasets into discrete events, timestamp them, and parse them as needed. This is useful for categorizing events and makes searching easier via addressing through a datatype field.

Go With The Flow​

To illustrate how these objects relate to each other, let's imagine we have an Amazon S3 bucket with web logs from various technologies and AWS services that we want to search. Our data could be configured as follows:

TermExampleDescription
Dataset Providermy_s3_dataThis user-defined uid specifies the Amazon S3 bucket that Cribl should search, along with the AWS credentials Cribl should use to access the data.
Dataset Provider TypeAmazon S3This selection denotes that the type of dataset provider is Amazon S3.
Datasetweb_logsThis user-defined uid specifies what data in the S3 bucket should be searched and how to identify that data (i.e. S3 bucket path).
Datatype
  • aws_vpcflow
  • aws_alb_accesslogs
  • aws_elb_accesslogs
  • aws_cloudfront_accesslogs
  • apache_access_combined
This identifies what types of data (defined by pre-configured and/or custom rules) are found in the dataset.

Supported Data Formats​

Much you can do, once trained in the ways of The Force Cribl Search, you are. One such power that you will gain is the ability to search a vast array of supported data formats.

Data FormatData Extenstions
Gzip.gz .gzip .tgz .tar.gz application/x-gzip
Journal.journal .journal~
LZ4.lz4
Parquet.parquet .pqt .parq
Snappy.snappy
rawdata
Tar.tar .tgz .tar.gz
Text.log .csv .json .ndjson .txt
Zip.zip
Zstd.zst