Skip to main content

Everything Old Is New Again

In the Archiving to S3 course, we configured a Data Route that streamed raw data flowing through Stream into an S3 bucket for archival purposes. This allows us to replay (or selectively re-ingest) that data at any time we need it.

Now that we need to replay our data, the first step is to add that same S3 bucket as a Source to Stream. Not just any Source, though - a Collector Source.

important

Add an S3 Collector

  1. Select the Data submenu and click Sources
  2. Click S3 under Collectors
  3. Click Add Collector
note

On top of the Sources page, you can use the Autcomplete Search box to quickly locate your desired Source. Typeahead assist narrows the displayed results as you enter your query.

Try it out with S3.

Data flow is what differentiates a Source from a Collector Source. A regular Source constantly pushes data into Stream and we simply configure Stream to listen. A Collector Source is one where Stream reaches out and pulls data. Collector Sources work well for Sources hosting data at rest that isn’t being pushed anywhere.

In our case, we want to collect the data from our makeshift archive data lake and then pump it into a different Destination.

important

Fill out the S3 Collector details below

  • Collector ID - s3replay
  • Auto-populate from - s3:archives3
  • Path -
    '/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${sourcetype}/${src_zone}/${src_ip}/${dest_zone}/${dest_ip}/'

NOTE: This path string denotes how the files have been stored in S3. IT changed the pathing schema to better fit their scripts, so it is slightly different than the one we configured on our S3 destination.

Event Breaker

We’ll take a little side step in the middle here to add an Event Breaker. Sparknotes for Event Breakers: They help Stream understand where one event ends and another begins. We are going to tell it to use the Cribl standard ruleset for breaking large chunks of data into smaller events.

important

Add an Event Breaker

  1. Click Event Breakers
  2. Click Add ruleset
  3. Select Cribl

Route Bypass

Now here’s the really cool part. We can completely bypass configuring a Route since we want this data to flow straight into our new SIEM. We can tell Stream to use the passthru Pipeline to send the data to our preconfigured Destination.

important

Bypass the Routes

  1. Click Result Routing on the left-hand side
  2. Turn Off the switch for Send to Routes
  3. Select passthru as our Pipeline
  4. Select syslog:exabeam as our Destination
  5. Click Save

And that’s it really. If you went ahead and tried to click the Live view to see if data is flowing, you probably saw that there isn't a button for that. This is because our Source is a Collector and data only flows when scheduled or manually run. Let’s do that and watch our data flow!