Skip to main content

Installing the Data Tiering Packs

TL;DR

We've built two packs for you: one with pre-configured dashboards for forensic search, and one with automated routing for data tiering. You'll install both packs, create the required Lake infrastructure, and verify everything's working.


The Foundation

Before you can experiment with the packs, we need to set up the infrastructure:

  • Warehouses (Lake datasets) to store data
  • Distribution centers (Lakehouse) for efficient querying
  • Pre-built automation (Stream pack) for intelligent routing
  • Investigation tools (Search pack) for forensic analysis

Let's get everything installed.

Step 1: Create Your Lake Dataset

Now you need storage for your tiered data. This keeps your work isolated and makes it easy to manage.

Create Dataset
  1. Switch to Cribl Lake in the product switcher
  2. In the left navigation, click Datasets
  3. Click Add Dataset in the upper right
  4. Configure the new dataset:
    • Dataset ID: datatieringlogs
    • Description: Multi-tier data for search optimization sandbox
    • Retention: 30 days (sufficient for this sandbox)
    • Format: JSON (easier to read and debug)
  5. Click Save
Why Create This Manually?

The Stream pack can't create Lake datasets for you (that's a Lake product feature), so you need to create this manually. But it's quick - just one dataset that will store all three tiers of data. Think of it like this:

  • The Stream pack has the routing logic (where to send data)
  • You need to create the storage location (where it lands)

Step 2: Create Your Lakehouse

A Lakehouse provides indexed, queryable access to your Lake data - perfect for Tier 2 (warm) storage.

Create Lakehouse
  1. Still in Cribl Lake, look for Lakehouses in the left navigation
  2. Click Add Lakehouse in the upper right
  3. Configure your Lakehouse:
    • Lakehouse ID: tier2LH
    • Description: Lakehouse for Tier 2 (warm) data - security and user experience events
  4. Click Save
What's a Lakehouse?

If you're new to Cribl Lakehouse, here's the quick version:

Cribl Lake = Raw, cost-effective object storage
Cribl Lakehouse = Indexed, queryable layer on top

Benefits:

  • Faster queries: Indexed fields speed up searches dramatically
  • Better compression: Parquet format reduces storage costs
  • KQL compatibility: Query with standard KQL syntax

For this sandbox, Lakehouse represents Tier 2 (warm storage) - events that are important but not critical enough for the expensive SIEM.

Step 3: Install the Stream Pack

The Stream pack contains:

  • Pre-configured pipeline (parserlogs) that tags events with business impact
  • Three-tier routing (critical → SIEM, security/UX → Lakehouse, routine → Lake)
  • Sample data source for testing
Install Stream Pack
  1. In Cribl Cloud navigate to Stream, navigate to Packs
  2. Click Add Pack at the top right
  3. Select Import from URL
  4. Enter this URL: https://sandbox.cribl.io/assets/packs/sandboxdatatieringpack.crbl
  5. Click Import
  6. When prompted, select default Worker Group
  7. Click Deploy
  8. The pack name will be sandboxdatatieringpack
What's in the Stream Pack?

This pack includes:

  • Pipeline: parserlogs - Extracts fields from web logs and tags each event with business_impact based on HTTP status codes
  • Route: tier1 - Sends critical events (5xx errors) to your SIEM
  • Route: tier2 - Sends security/UX events to Lakehouse
  • Route: tier3 - Archives all events to cold Lake storage
  • Source: sbx_weblogs - Generates sample web traffic for testing

You won't need to configure these - they're ready to use. You'll just need to connect them to your destinations.

Step 4: Install the Search Pack

The Search pack contains pre-built dashboards for forensic investigations.

Install Search Pack
  1. Switch to Cribl Search using the product nav
  2. Navigate to Packs in the left menu
  3. Click Import Pack at the top right
  4. Select Import from URL
  5. Enter this URL: https://sandbox.cribl.io/assets/packs/datatieringpacksearch.crbl
  6. Click Import
  7. Navigate to Packs in the left menu and your pack datatieringpacksearch will show
What's in the Search Pack?

This pack includes:

  • Interactive Forensic Dashboard - Search across datasets for specific IPs, domains, or terms, then send matches to investigation dataset
  • Events to Lake Dashboard - Confirmation view for events being sent to Lake
  • Pre-configured queries for common investigation patterns

You'll learn how to use these dashboards in Module 5.

Step 5: View || Configure Destinations

Now you need to tell the Stream pack where to send data for each tier.

Destination 1: Top Tier SIEM (Webhook Simulator)

View Configured Webhook Destination
  1. Switch back to Cribl Stream
  2. Navigate to Processing > Packs > sandboxdatatieringpack > Destinations
  3. Click Webhook tile
  4. Click Add Destination
  5. View the webhook:
    • Output ID: top_tier_siem
    • URL: https://webhook.site/datatieringsandbox (see tip below)
    • Method: POST
Change Your Webhook URL

Don't have a webhook URL? No problem:

  1. Visit webhook.site in a new browser tab
  2. You'll get a unique URL automatically
  3. Copy that URL and paste it into the destination config
  4. Keep the webhook.site tab open - you'll see events arrive in real-time
  5. or you can use ours https://webhook.site/datatieringsandbox In production, this would be your Splunk Software, Elastic endpoint, or other SIEM, APM etc.

Destination 2: Tier 2 Lakehouse

View Lakehouse Destination
  1. Still in Processing > Packs > sandboxdatatieringpack > Destinations, search for Lakehouse
  2. Click Add Destination
  3. View:
    • Output ID: tier2-LH
    • Lakehouse: Select tier2-LH (the one you created earlier)
  4. Click Save

Destination 3: Tier 3 Lake

Configure Lake Destination
  1. Still in Data > Destinations, search for Cribl Lake
  2. Click Add Destination
  3. View:
    • Output ID: lake-default
    • Lake Dataset: Select datatieringlogs
  4. Click Save

Step 6: Connect the Pack to Your Destinations

The Stream pack's routes need to know which destinations you just created.

Update Route Destinations
  1. Navigate to Routing > Data Routes
  2. You should see three routes from the pack:
    • tier1
    • tier2
    • tier3
  3. Click into tier1 route
  4. Update the Output field to: webhook:top_tier_siem
  5. Click Save
  6. Click into tier2 route
  7. Update the Output field to: cribl_lake:tier2-LH
  8. Click Save
  9. Click into tier3 route
  10. Update the Output field to: cribl_lake:lake-default
  11. Click Save
  12. Click Commit in the upper right
  13. Click Commit & Deploy
Why This Step?

The pack can't know what you named your destinations, so you need to connect them manually. Think of it like plugging in cables - the routes (logic) are pre-built, you just need to connect them to your endpoints.

Step 7: View the Data Source

The pack includes a sample data generator. Let's turn it on.

View Sample Data Flow
  1. Navigate to Packs > sandboxdatatieringpack > sources
  2. Find datagen in the list
  3. Click Live Data in the Status column
  4. Verify you see Apache-style log events flowing
  5. Look for fields like status, clientip, request, bytes
  6. Click Close when done observing

Step 8: Connect LH to dataset

Associate LH to Dataset
  1. Switch to Cribl Lake in the product switcher

  2. In the left navigation, click Datasets

  3. Click datatieringlogs in the list

  4. Configure the new dataset:

    • Lakehouse: tier2LH
  5. Click Save

Step 9: Verify Everything Works

Let's make sure data is flowing through all three tiers.

Check Tier 1 (SIEM)

Verify Critical Events
  1. Open your webhook.site tab (from earlier)
  2. You should see POST requests arriving
  3. Look for events with "business_impact":"critical" and "status":500+
  4. These are your critical server errors reaching the "SIEM"

If you don't see any events yet:

  • Wait 1-2 minutes (data needs time to flow through)
  • Check that the sbx_weblogs source is enabled
  • Verify the tier1 route output is set correctly

Check Tier 2 (Lakehouse)

Verify Security/UX Events
  1. Switch to Cribl Search
  2. In the query box, enter:
    dataset="tier2_lh" | limit 10
  3. Click Search
  4. You should see events with business_impact of security or user_experience
  5. Look for status codes: 401, 403, 404
Case Sensitivity

The Lakehouse dataset name might be tier2-LH or tier2_lh depending on how Lake handles hyphens. If the first query doesn't work, try:

dataset="datatieringlogs"
| where business_impact IN ("security", "user_experience")
| limit 10

Check Tier 3 (Cold Lake)

Verify All Events
  1. Still in Cribl Search, run:
    dataset="datatieringlogs"
    | stats count by business_impact
  2. You should see four categories:
    • critical
    • security
    • user_experience
    • routine
  3. This confirms all events are landing in cold storage

What You've Accomplished

At this point, you have:

  • ✅ Installed Stream pack with pre-configured routing logic
  • ✅ Installed Search pack with forensic investigation dashboards
  • ✅ Viewed Lake dataset (datatieringlogs) for storing all tiers
  • ✅ Created Lakehouse (tier2-LH) for indexed Tier 2 queries
  • ✅ Connected three destinations representing your tiers:
    • Webhook (SIEM) for critical events
    • Lakehouse for security/UX investigations
    • Lake for cold archive
  • ✅ Verified data is flowing through all three tiers

Now you're ready to explore how it works and experiment with the system.


Next: Understanding Business Impact Tags →