Installing the Data Tiering Packs

TL;DR

We've built two packs for you: one with pre-configured dashboards for forensic search, and one with automated routing for data tiering. You'll install both packs, create the required Lake infrastructure, and verify everything's working.

The Foundation

Before you can experiment with the packs, we need to set up the infrastructure:

Warehouses (Lake datasets) to store data
Distribution centers (Lakehouse) for efficient querying
Pre-built automation (Stream pack) for intelligent routing
Investigation tools (Search pack) for forensic analysis

Let's get everything installed.

Step 1: Create Your Lake Dataset

Now you need storage for your tiered data. This keeps your work isolated and makes it easy to manage.

Create Dataset

Switch to Cribl Lake in the product switcher
In the left navigation, click Datasets
Click Add Dataset in the upper right
Configure the new dataset:
- Dataset ID: datatieringlogs
- Description: Multi-tier data for search optimization sandbox
- Retention: 30 days (sufficient for this sandbox)
- Format: JSON (easier to read and debug)
Click Save

Why Create This Manually?

The Stream pack can't create Lake datasets for you (that's a Lake product feature), so you need to create this manually. But it's quick - just one dataset that will store all three tiers of data. Think of it like this:

The Stream pack has the routing logic (where to send data)
You need to create the storage location (where it lands)

Step 2: Create Your Lakehouse

A Lakehouse provides indexed, queryable access to your Lake data - perfect for Tier 2 (warm) storage.

Create Lakehouse

Still in Cribl Lake, look for Lakehouses in the left navigation
Click Add Lakehouse in the upper right
Configure your Lakehouse:
- Lakehouse ID: tier2LH
- Description: Lakehouse for Tier 2 (warm) data - security and user experience events
Click Save

What's a Lakehouse?

If you're new to Cribl Lakehouse, here's the quick version:

Cribl Lake = Raw, cost-effective object storage
Cribl Lakehouse = Indexed, queryable layer on top

Benefits:

Faster queries: Indexed fields speed up searches dramatically
Better compression: Parquet format reduces storage costs
KQL compatibility: Query with standard KQL syntax

For this sandbox, Lakehouse represents Tier 2 (warm storage) - events that are important but not critical enough for the expensive SIEM.

Step 3: Install the Stream Pack

The Stream pack contains:

Pre-configured pipeline (parserlogs) that tags events with business impact
Three-tier routing (critical → SIEM, security/UX → Lakehouse, routine → Lake)
Sample data source for testing

Install Stream Pack

In Cribl Cloud navigate to Stream, navigate to Packs
Click Add Pack at the top right
Select Import from URL
Enter this URL: https://sandbox.cribl.io/assets/packs/sandboxdatatieringpack.crbl
Click Import
When prompted, select default Worker Group
Click Deploy
The pack name will be sandboxdatatieringpack

What's in the Stream Pack?

This pack includes:

Pipeline: parserlogs - Extracts fields from web logs and tags each event with business_impact based on HTTP status codes
Route: tier1 - Sends critical events (5xx errors) to your SIEM
Route: tier2 - Sends security/UX events to Lakehouse
Route: tier3 - Archives all events to cold Lake storage
Source: sbx_weblogs - Generates sample web traffic for testing

You won't need to configure these - they're ready to use. You'll just need to connect them to your destinations.

Step 4: Install the Search Pack

The Search pack contains pre-built dashboards for forensic investigations.

Install Search Pack

Switch to Cribl Search using the product nav
Navigate to Packs in the left menu
Click Import Pack at the top right
Select Import from URL
Enter this URL: https://sandbox.cribl.io/assets/packs/datatieringpacksearch.crbl
Click Import
Navigate to Packs in the left menu and your pack datatieringpacksearch will show

What's in the Search Pack?

This pack includes:

Interactive Forensic Dashboard - Search across datasets for specific IPs, domains, or terms, then send matches to investigation dataset
Events to Lake Dashboard - Confirmation view for events being sent to Lake
Pre-configured queries for common investigation patterns

You'll learn how to use these dashboards in Module 5.

Step 5: View || Configure Destinations

Now you need to tell the Stream pack where to send data for each tier.

Destination 1: Top Tier SIEM (Webhook Simulator)

View Configured Webhook Destination

Switch back to Cribl Stream
Navigate to Processing > Packs > sandboxdatatieringpack > Destinations
Click Webhook tile
Click Add Destination
View the webhook:
- Output ID: top_tier_siem
- URL: https://webhook.site/datatieringsandbox (see tip below)
- Method: POST

Change Your Webhook URL

Don't have a webhook URL? No problem:

Visit webhook.site in a new browser tab
You'll get a unique URL automatically
Copy that URL and paste it into the destination config
Keep the webhook.site tab open - you'll see events arrive in real-time
or you can use ours https://webhook.site/datatieringsandbox In production, this would be your Splunk Software, Elastic endpoint, or other SIEM, APM etc.

Destination 2: Tier 2 Lakehouse

View Lakehouse Destination

Still in Processing > Packs > sandboxdatatieringpack > Destinations, search for Lakehouse
Click Add Destination
View:
- Output ID: tier2-LH
- Lakehouse: Select tier2-LH (the one you created earlier)
Click Save

Destination 3: Tier 3 Lake

Configure Lake Destination

Still in Data > Destinations, search for Cribl Lake
Click Add Destination
View:
- Output ID: lake-default
- Lake Dataset: Select datatieringlogs
Click Save

Step 6: Connect the Pack to Your Destinations

The Stream pack's routes need to know which destinations you just created.

Update Route Destinations

Navigate to Routing > Data Routes
You should see three routes from the pack:
- tier1
- tier2
- tier3
Click into tier1 route
Update the Output field to: webhook:top_tier_siem
Click Save
Click into tier2 route
Update the Output field to: cribl_lake:tier2-LH
Click Save
Click into tier3 route
Update the Output field to: cribl_lake:lake-default
Click Save
Click Commit in the upper right
Click Commit & Deploy

Why This Step?

The pack can't know what you named your destinations, so you need to connect them manually. Think of it like plugging in cables - the routes (logic) are pre-built, you just need to connect them to your endpoints.

Step 7: View the Data Source

The pack includes a sample data generator. Let's turn it on.

View Sample Data Flow

Navigate to Packs > sandboxdatatieringpack > sources
Find datagen in the list
Click Live Data in the Status column
Verify you see Apache-style log events flowing
Look for fields like status, clientip, request, bytes
Click Close when done observing

Step 8: Connect LH to dataset

Associate LH to Dataset

Switch to Cribl Lake in the product switcher
In the left navigation, click Datasets
Click datatieringlogs in the list
Configure the new dataset:
- Lakehouse: tier2LH
Click Save

Step 9: Verify Everything Works

Let's make sure data is flowing through all three tiers.

Check Tier 1 (SIEM)

Verify Critical Events

Open your webhook.site tab (from earlier)
You should see POST requests arriving
Look for events with "business_impact":"critical" and "status":500+
These are your critical server errors reaching the "SIEM"

If you don't see any events yet:

Wait 1-2 minutes (data needs time to flow through)
Check that the sbx_weblogs source is enabled
Verify the tier1 route output is set correctly

Check Tier 2 (Lakehouse)

Verify Security/UX Events

Switch to Cribl Search
In the query box, enter:
```
dataset="tier2_lh" | limit 10
```
Click Search
You should see events with business_impact of security or user_experience
Look for status codes: 401, 403, 404

Case Sensitivity

The Lakehouse dataset name might be tier2-LH or tier2_lh depending on how Lake handles hyphens. If the first query doesn't work, try:

dataset="datatieringlogs"
| where business_impact IN ("security", "user_experience")
| limit 10

Check Tier 3 (Cold Lake)

Verify All Events

Still in Cribl Search, run:

dataset="datatieringlogs"
| stats count by business_impact

You should see four categories:
- critical
- security
- user_experience
- routine
This confirms all events are landing in cold storage

What You've Accomplished

At this point, you have:

✅ Installed Stream pack with pre-configured routing logic
✅ Installed Search pack with forensic investigation dashboards
✅ Viewed Lake dataset (datatieringlogs) for storing all tiers
✅ Created Lakehouse (tier2-LH) for indexed Tier 2 queries
✅ Connected three destinations representing your tiers:
- Webhook (SIEM) for critical events
- Lakehouse for security/UX investigations
- Lake for cold archive
✅ Verified data is flowing through all three tiers

Now you're ready to explore how it works and experiment with the system.

Next: Understanding Business Impact Tags →

TL;DR​

The Foundation​

Step 1: Create Your Lake Dataset​

Step 2: Create Your Lakehouse​

Step 3: Install the Stream Pack​

Step 4: Install the Search Pack​

Step 5: View || Configure Destinations​

Destination 1: Top Tier SIEM (Webhook Simulator)​

Destination 2: Tier 2 Lakehouse​

Destination 3: Tier 3 Lake​

Step 6: Connect the Pack to Your Destinations​

Step 7: View the Data Source​

Step 8: Connect LH to dataset​

Step 9: Verify Everything Works​

Check Tier 1 (SIEM)​

Check Tier 2 (Lakehouse)​

Check Tier 3 (Cold Lake)​

What You've Accomplished​