Skip to main content

Where Does Data Come From?

TL;DR

Sources are locations where data originates from and Stream can integrate with a LOT of them. This helps avoid vendor lock-in in that you don't need to worry about getting your data to, say, a new SIEM vendor if you want to switch away from your old one.

Logically, it makes sense to start in Sources. This is where the data that we need is hosted and generated, and we can transform it and move it somewhere else (Destinations).

important
  1. While on the Stream Home page, click into the default Worker Group
  2. Select the Data submenu then click Sources

Examples of some sources here are:

  • Amazon S3 (our low-cost long term archive)
  • Syslog from our Palo Alto firewall
  • HTTP
  • Elasticsearch API

All of these Sources send data somewhere. Stream can sit in front and listen to the data they send – which is why we call them Sources: Data generators.

note

As an aside, Stream has an actual datagen built in for testing purposes. You can upload a sample of your data then configure Stream to push the data into itself at prescribed intervals.

Datagens allow you to apply functions to enrich and transform your data without being in production. Neat!

Actually, let's not just make a note about it, let's configure a datagen!

Create a Datagen
  1. Still in the Stream > default Worker Group, at the top, click into Data > Sources.
  2. Find and click on the Datagen tile.
  3. Click Add Source.
  4. For Input ID, enter palo_traffic.
  5. For the Datagen > Data Generator File, select palo_alto_traffic.log.
  6. Click Save.

Cribl.Cloud runs our products in a distributed architecture (more on that later). What it means for us now is that our changes, while saved, haven't been pushed out to our workers. Let's go ahead and do that.

Commit & Deploy
  1. In the top right, click Commit & Deploy
    I don't have that button...

    If you are seeing separate Commit and Deploy buttons, click Commit instead.

  2. In the resulting window, click Commit & Deploy in the bottom right.
Remember me fondly

In the rest of this sandbox, the instructions will simply say, "Commit & Deploy". Refer back to these instructions as needed.

Also of note here is that S3 is available as a Source. S3 is not always available an integrated data Source in other tools.

A lot of admins have trouble storing all their data in their Security Information and Event Management (SIEM) tool, because it requires quickly responsive storage (read: SSDs). With Stream, you can push a copy of all your data to cheaper long term storage and cut down on infrastructure costs.

For more details into S3, check out our How-To course: Archiving to S3

Look at Flowing Data

Check out Live view!

Click Live in the Status column to the right of palo_traffic

You can see the data coming in from any configured Source. This helps eliminate guess-and-check or cross-your-fingers-and-hope restarts just to see if you configured your Source correctly.

You can also save the data from this window and use it as a sample to check your work later on. We’ll show you what that looks like in a bit.

For now, feel free to explore the Sources (configured or not) and move on to the next screen when you’re ready.