Skip to main content

Create S3/MinIO Destination

In our sandbox instance, we have a copy of MinIO running locally, outputting data to the /data directory. MinIO is an open-source project which provides an on-prem, S3-compatible interface to local storage. First, we're going to configure the destination in Stream.

There's a reasonable amount of configuration here, so we're going to break it up into a few sections of instructions.

important

Start Adding S3 Destination

  1. With Manage active in Stream's top nav, select the Data submenu, then click Destinations.
  2. Click the Amazon S3 tile.
  3. Click Add Destination at the top right.
  4. For Output ID, enter s3.
  5. For S3 Bucket Name, enter logarchives.
  6. For Key prefix, enter prefix.
  7. For Region, select US East (N. Virginia).
  8. For Compress, select none.

Let's explain what we've done so far. Output ID should be something descriptive, which we've set to s3. S3 Bucket Name should be set to the name of the bucket. In our MinIO setup, this corresponds to a directory under /data, in this case /data/logarchives. Key Prefix is the top-level directory, or set of directories, we want to place files in. Our MinIO instance expects Region to be set to US East (N. Virginia). Normally, sending compressed data is optimal, but for our purposes we are going to set the Compress setting to none to send the data without compression. This will allow us to verify that we are receiving the expected data in a later step.

The Partitioning Expression field calls for a bit of explanation; it is a very important concept for making sure data is efficient to analyze in S3. A partitioning expression uses a JavaScript expression to define the path to prepend to the file name. For each unique value of the partitioning expression, a new file will be created.

Having data partitioned allows analysis engines coming after the fact to decide which files to include or exclude, based off the path to the file. It is much faster to analyze the full path to the file, and to exclude whole files which do not match a pattern, than to decide which data is relevant by scanning the files' contents for matches.

A partitioning expression is a JavaScript expression, the same convention used throughout Stream in Filters and Data Routes. Expressions allow us to use the full power of JavaScript to define even complex patterns.

For our use case, we'd like to partition by time, as well as by host and sourcetype.

important

Partition by Time, Host, and Sourcetype

  1. Copy the expression below and paste it over the Partitioning Expression:
`${C.Time.strftime(_time, '%Y-%m-%d_%H')}/${host}/${sourcetype}`

Breaking this down, this uses a JavaScript template literal, a simple templating language built into JavaScript, to define a pattern which will be based on the data in the event. The two backticks define the start and end of the expression. With the following expression, if we had a host of foo and a sourcetype of bar, then foo/bar would be prepended to the filename.

To get partitions – like {host} and {sourcetype} in the coming instructions – the partitioning expression will use the values of the data in the event. Any field in the event is available for use.

With this partitioning expression, we've reused the default time dimension, of date and hour: ${C.Time.strftime(_time, '%Y-%m-%d_%H')} showcases the power of having a full language to articulate expressions.

Here, we're calling C.Time.strftime(). C is our global object. Time contains several time utility functions, including strftime().

For the first parameter, we're passing in a value of _time to determine the date. Lastly, '%Y‑%m‑%d_%H' is a strftime() format string which determines the output format, in this case a format like 2020-03-20_8. We want time only down to the hour, to group data by reasonable units for partitioning.

Lastly, we include ${host}/${sourcetype} from our original partitioning structure. In the end, an example result of this partitioning expression might be a returned value like: 2020‑03‑20_8/foohost.cribl.io/splunkd_access_log.

With this information encoded in the directory structures, readers of this bucket will be able to easily narrow down queries – based on where the data came from, what time it came in, what host it came in from, and what type of data we have.

Now, we need to set a few more configuration items in different tabs. First off, we need to enter the authentication information for our MinIO Instance. This is the "API Key" and "Secret Key" that you'd use for AWS S3, but for our purposes, we've set a simple set of credentials.

important

Change Authentication

  1. Click the left Authentication tab.
  2. Toggle the Authentication Method from Auto to Manual. This reveals two new fields:
  3. For Access key, enter ACCESSKEY.
  4. For Secret key, enter SECRETKEY.

For the purposes of this course, we want to ensure that we're getting files on a pretty regular interval. So we're going to lower our default settings for how long to keep files open, and for how large to let the files grow. (The defaults are usually fine, but in higher-volume applications, you might raise them. Here, we're lowering them to receive files more frequently.)

important

Change File-Size Parameters

  1. Click the left Advanced Settings tab:
  2. For Max File Size (MB), enter 5.
  3. For Max File Open Time (Sec), enter 10.
  4. For Max File Idle Time (Sec), enter 10.

Finally, since we're using a local instance of MinIO, instead of the actual S3 service hosted at AWS, we need to tell the configuration where to connect to. The MinIO service is running on a separate container within this course, called minio.

important

Set the Service Endpoint Still in the Advanced Settings tab:

  • For Endpoint, enter:
    http://minio:9000/

To conclude this config, let's save the S3 Destination we've created, and set it as Stream's default Destination.

important

Complete Adding Destination

  1. Click Save.
  2. On the left-hand side of Stream, in the Destination list, scroll up to the Default Destination, and click it.
  3. Click the default row at right to expand its configuration.
  4. Set the Default Output ID drop-down to s3.
  5. Click Save to update the Default Destination.

Now we've set up the S3 Destination! Next, let's move on to creating a Source, and starting our Splunk Universal Forwarder.