Create S3/MinIO Destination
In our sandbox instance, we have a copy of MinIO running locally, outputting data to the /data
directory. MinIO is an open-source project which provides an on-prem, S3-compatible interface to local storage. First, we're going to configure the destination in Stream.
There's a reasonable amount of configuration here, so we're going to break it up into a few sections of instructions.
Start Adding S3 Destination
- With
Manage
active in Stream's top nav, select theData
submenu, then clickDestinations
. - Click the
Amazon S3
tile. - Click
Add Destination
at the top right. - For
Output ID
, enters3
. - For
S3 Bucket Name
, enterlogarchives
. - For
Key prefix
, enterprefix
. - For
Region
, selectUS East (N. Virginia)
. - For
Compress
, selectnone
.
Let's explain what we've done so far. Output ID
should be something descriptive, which we've set to s3
. S3 Bucket Name
should be set to the name of the bucket. In our MinIO setup, this corresponds to a directory under /data
, in this case /data/logarchives
. Key Prefix
is the top-level directory, or set of directories, we want to place files in. Our MinIO instance expects Region
to be set to US East (N. Virginia)
. Normally, sending compressed data is optimal, but for our purposes we are going to set the Compress
setting to none
to send the data without compression. This will allow us to verify that we are receiving the expected data in a later step.
The Partitioning Expression
field calls for a bit of explanation; it is a very important concept for making sure data is efficient to analyze in S3. A partitioning expression uses a JavaScript expression to define the path to prepend to the file name. For each unique value of the partitioning expression, a new file will be created.
Having data partitioned allows analysis engines coming after the fact to decide which files to include or exclude, based off the path to the file. It is much faster to analyze the full path to the file, and to exclude whole files which do not match a pattern, than to decide which data is relevant by scanning the files' contents for matches.
A partitioning expression is a JavaScript expression, the same convention used throughout Stream in Filters and Data Routes. Expressions allow us to use the full power of JavaScript to define even complex patterns.
For our use case, we'd like to partition by time, as well as by host and sourcetype.
Partition by Time, Host, and Sourcetype
- Copy the expression below and paste it over the
Partitioning Expression
:
`${C.Time.strftime(_time, '%Y-%m-%d_%H')}/${host}/${sourcetype}`
Breaking this down, this uses a JavaScript template literal, a simple templating language built into JavaScript, to define a pattern which will be based on the data in the event. The two backticks define the start and end of the expression. With the following expression, if we had a host of foo
and a sourcetype of bar
, then foo/bar
would be prepended to the filename.
To get partitions – like {host}
and {sourcetype}
in the coming instructions – the partitioning expression will use the values of the data in the event. Any field in the event is available for use.
With this partitioning expression, we've reused the default time dimension, of date and hour: ${C.Time.strftime(_time, '%Y-%m-%d_%H')}
showcases the power of having a full language to articulate expressions.
Here, we're calling C.Time.strftime()
. C
is our global object. Time
contains several time utility functions, including strftime()
.
For the first parameter, we're passing in a value of _time
to determine the date. Lastly, '%Y‑%m‑%d_%H'
is a strftime()
format string which determines the output format, in this case a format like 2020-03-20_8
. We want time only down to the hour, to group data by reasonable units for partitioning.
Lastly, we include ${host}/${sourcetype}
from our original partitioning structure. In the end, an example result of this partitioning expression might be a returned value like:
2020‑03‑20_8/foohost.cribl.io/splunkd_access_log
.
With this information encoded in the directory structures, readers of this bucket will be able to easily narrow down queries – based on where the data came from, what time it came in, what host it came in from, and what type of data we have.
Now, we need to set a few more configuration items in different tabs. First off, we need to enter the authentication information for our MinIO Instance. This is the "API Key" and "Secret Key" that you'd use for AWS S3, but for our purposes, we've set a simple set of credentials.
Change Authentication
- Click the left
Authentication
tab. - Toggle the
Authentication Method
fromAuto
toManual
. This reveals two new fields: - For
Access key
, enterACCESSKEY
. - For
Secret key
, enterSECRETKEY
.
For the purposes of this course, we want to ensure that we're getting files on a pretty regular interval. So we're going to lower our default settings for how long to keep files open, and for how large to let the files grow. (The defaults are usually fine, but in higher-volume applications, you might raise them. Here, we're lowering them to receive files more frequently.)
Change File-Size Parameters
- Click the left
Advanced Settings
tab: - For
Max File Size (MB)
, enter5
. - For
Max File Open Time (Sec)
, enter10
. - For
Max File Idle Time (Sec)
, enter10
.
Finally, since we're using a local instance of MinIO, instead of the actual S3 service hosted at AWS, we need to tell the configuration where to connect to. The MinIO service is running on a separate container within this course, called minio
.
Set the Service Endpoint
Still in the Advanced Settings
tab:
- For
Endpoint
, enter:http://minio:9000/
To conclude this config, let's save the S3 Destination we've created, and set it as Stream's default Destination.
Complete Adding Destination
- Click
Save
. - On the left-hand side of Stream, in the Destination list, scroll up to the
Default
Destination, and click it. - Click the
default
row at right to expand its configuration. - Set the
Default Output ID
drop-down tos3
. - Click
Save
to update the Default Destination.
Now we've set up the S3 Destination! Next, let's move on to creating a Source, and starting our Splunk Universal Forwarder.