Where Does Data Come From?
Sources are locations where data originates from and Stream can integrate with a LOT of them. This helps avoid vendor lock-in in that you don't need to worry about getting your data to, say, a new SIEM vendor if you want to switch away from your old one.
Logically, it makes sense to start in Sources. This is where the data that we need is hosted and generated, and we can transform it and move it somewhere else (Destinations).
- While on the
Stream Home
page, click into thedefault
Worker Group - Select the
Data
submenu then clickSources
Examples of some sources here are:
- Amazon S3 (our low-cost long term archive)
- Syslog from our Palo Alto firewall
- HTTP
- Elasticsearch API
All of these Sources send data somewhere. Stream can sit in front and listen to the data they send – which is why we call them Sources: Data generators.
As an aside, Stream has an actual datagen
built in for testing purposes. You can upload a sample of your data then configure Stream to push the data into itself at prescribed intervals.
Datagens allow you to apply functions to enrich and transform your data without being in production. Neat!
Actually, let's not just make a note about it, let's configure a datagen
!
- Still in the
Stream > default
Worker Group, at the top, click intoData > Sources
. - Find and click on the
Datagen
tile. - Click
Add Source
. - For
Input ID
, enterpalo_traffic
. - For the
Datagen > Data Generator File
, selectpalo_alto_traffic.log
. - Click
Save
.
Cribl.Cloud runs our products in a distributed architecture (more on that later). What it means for us now is that our changes, while saved, haven't been pushed out to our workers. Let's go ahead and do that.
Commit & Deploy
- In the top right, click
Commit & Deploy
I don't have that button...If you are seeing separate
Commit
andDeploy
buttons, clickCommit
instead. - In the resulting window, click
Commit & Deploy
in the bottom right.
In the rest of this sandbox, the instructions will simply say, "Commit & Deploy
". Refer back to these instructions as needed.
Also of note here is that S3 is available as a Source. S3 is not always available an integrated data Source in other tools.
A lot of admins have trouble storing all their data in their Security Information and Event Management (SIEM) tool, because it requires quickly responsive storage (read: SSDs). With Stream, you can push a copy of all your data to cheaper long term storage and cut down on infrastructure costs.
For more details into S3, check out our How-To course: Archiving to S3
Look at Flowing Data
Click Live
in the Status
column to the right of palo_traffic
You can see the data coming in from any configured Source. This helps eliminate guess-and-check or cross-your-fingers-and-hope restarts just to see if you configured your Source correctly.
You can also save the data from this window and use it as a sample to check your work later on. We’ll show you what that looks like in a bit.
For now, feel free to explore the Sources (configured or not) and move on to the next screen when you’re ready.