Skip to main content

Cross the Streams

Cribl Stream's built-in integration with Cribl Lake enables users to more easily get data where it's going. Now, instead of working on getting permissions and figuring out partitioning expressions and getting bogged down with endless meetings, you can get shit done, quickly. Let's start by getting some data flowing through our Stream into our Lake (heh). We'll use Stream's built-in Datagen Source to simulate some Apache web logs data.

The Spice Data must flow
  1. From the product switcher at the upper left, click Stream
  2. Click into the default Worker Group
  3. With the Manage tab active, click into Routing > QuickConnect
  4. Click Add Source at the top left
  5. Select Datagen and click Add New
  6. Set up your new Source by filling in the following information:
    • Input ID: sbx_apache
    • Data Generator File – add two:
      • apache_common.log with Events Per Second Per Worker Node set to 1
      • apache_error.log with Events Per Second Per Worker Node set to 1
  7. Click Save

In case you didn't know, Cribl Stream can similarly generate data based on logs, metrics, and traces that you upload as files. This makes it easy to test out Stream Pipelines without trying to reproduce your full production environment.

Anyway, let's configure our Lake as a Destination and then add in some fun routing for a later thing (send).

Plumb it like it's hot
  1. Still on Routing > QuickConnect, click Add Destination at the top right
  2. Select Cribl Lake and click Add New
  3. Configure your new Lake Destination by filling in the following information:
    • Output ID: sbx_apache
    • Lake dataset: default_logs
  4. Click Save
  5. Click the + on the sbx_apache Datagen Source on the left, drag a connection over to the sbx_apache Cribl Lake Destination on the right, and release
  6. In the resulting pop-up, leave Passthru selected and click Save

No AWS or IT or Security team needed to be consulted. No ticket to be opened and the resulting weeklong waits and myriad meetings. We just set up incoming data to send to our Cribl Lake. Well, it's technically not sending yet, we need to Commit & Deploy, but first let's add a (literal) feedback loop. Don't worry, we'll explain.

Feedback Loop
  1. Still on Routing > QuickConnect, click Add Source on the left
  2. Select Cribl HTTP and click Select Existing
  3. Click the in_cribl_http Source
  4. In the resulting pop-up click Yes
  5. Click Add Destination at the top right
  6. Select Cribl Lake and click Add New
  7. Configure your Source by filling in the following information:
    • Output ID: sbx_incident_response
    • Lake dataset: sbx_incident_response
  8. Click Save
  9. Click the + on the in_cribl_http Source on the left, drag a connection over to the sbx_incident_response Cribl Lake Destination on the right, and release
    Expand your Destinations

    Can't see your sbx_incident_response Lake Destination? You may need to click the Cribl Lake tile to expand and show both Lake Destinations we've configured.

  10. In the resulting pop-up, leave Passthru selected and click Save

OK, what did we just do? Well, later on we're going to explore using Cribl Search's send operator, which allows users to send query results to Cribl Stream through the in_cribl_http Source. In this instance, we'll use this Source as a feedback loop where we send the data to a "new" dataset in order to expedite our searches. Fun fact: there's a better way to do that, which we will also do 😉. The reason we are configuring this connection, however, is so that you can see how simple it would be to replace the sbx_incident_response destination with, say, your SIEM of choice.

Time to push this configuration to our Workers so that we can see the fruits of our loom labour.

Commit & Deploy
  1. At the top right, click the blue Commit & Deploy button
  2. Enter a commit message that reflects the hard work we've done (example below)
    sbx_lake configuration
    - added datagen for apache logs and errors
    - added Cribl Lake destinations for default_logs and sbx_incident_response
    - connected in_cribl_http to Cribl Lake sbx_incident_response
  3. Click Commit & Deploy

Replay and Me

Cribl Lake also simplifies Cribl Stream's built-in Replay capability. If you've configured Replay before, you'll know it's already pretty straightforward (permissions be damned!); however, Cribl Lake makes it even easier, with a dedicated Stream Collector.

Be kind, replay
  1. Click Data > Sources in the top menu
  2. Under Collectors, click Cribl Lake
    Collectors???

    If you don't see a Collectors section in the Sources, check that you have disabled ad blockers. Turns out they "collect" information, so this word is often just outright blocked. LOL.

  3. Click Add Collector at the top right
  4. Observe that you only need two pieces of information to configure a Cribl Lake Replay: Collector ID and Lake dataset
  5. When you're done, close out of this modal by clicking Cancel

Time to go check out Cribl Search & Lake. McCormick Cribl, 'It's gonna be great!'