Installing the Data Tiering Packs
TL;DR
We've built two packs for you: one with pre-configured dashboards for forensic search, and one with automated routing for data tiering. You'll install both packs, create the required Lake infrastructure, and verify everything's working.
The Foundation
Before you can experiment with the packs, we need to set up the infrastructure:
- Warehouses (Lake datasets) to store data
- Distribution centers (Lakehouse) for efficient querying
- Pre-built automation (Stream pack) for intelligent routing
- Investigation tools (Search pack) for forensic analysis
Let's get everything installed.
Step 1: Create Your Lake Dataset
Now you need storage for your tiered data. This keeps your work isolated and makes it easy to manage.
- Switch to Cribl Lake in the product switcher
- In the left navigation, click Datasets
- Click Add Dataset in the upper right
- Configure the new dataset:
- Dataset ID:
datatieringlogs - Description:
Multi-tier data for search optimization sandbox - Retention:
30 days(sufficient for this sandbox) - Format:
JSON(easier to read and debug)
- Dataset ID:
- Click Save
The Stream pack can't create Lake datasets for you (that's a Lake product feature), so you need to create this manually. But it's quick - just one dataset that will store all three tiers of data. Think of it like this:
- The Stream pack has the routing logic (where to send data)
- You need to create the storage location (where it lands)
Step 2: Create Your Lakehouse
A Lakehouse provides indexed, queryable access to your Lake data - perfect for Tier 2 (warm) storage.
- Still in Cribl Lake, look for Lakehouses in the left navigation
- Click Add Lakehouse in the upper right
- Configure your Lakehouse:
- Lakehouse ID:
tier2LH - Description:
Lakehouse for Tier 2 (warm) data - security and user experience events
- Lakehouse ID:
- Click Save
If you're new to Cribl Lakehouse, here's the quick version:
Cribl Lake = Raw, cost-effective object storage
Cribl Lakehouse = Indexed, queryable layer on top
Benefits:
- Faster queries: Indexed fields speed up searches dramatically
- Better compression: Parquet format reduces storage costs
- KQL compatibility: Query with standard KQL syntax
For this sandbox, Lakehouse represents Tier 2 (warm storage) - events that are important but not critical enough for the expensive SIEM.
Step 3: Install the Stream Pack
The Stream pack contains:
- Pre-configured pipeline (
parserlogs) that tags events with business impact - Three-tier routing (critical → SIEM, security/UX → Lakehouse, routine → Lake)
- Sample data source for testing
- In Cribl Cloud navigate to Stream, navigate to Packs
- Click Add Pack at the top right
- Select Import from URL
- Enter this URL:
https://sandbox.cribl.io/assets/packs/sandboxdatatieringpack.crbl - Click Import
- When prompted, select
defaultWorker Group - Click Deploy
- The pack name will be
sandboxdatatieringpack
This pack includes:
- Pipeline: parserlogs - Extracts fields from web logs and tags each event with
business_impactbased on HTTP status codes - Route: tier1 - Sends critical events (5xx errors) to your SIEM
- Route: tier2 - Sends security/UX events to Lakehouse
- Route: tier3 - Archives all events to cold Lake storage
- Source: sbx_weblogs - Generates sample web traffic for testing
You won't need to configure these - they're ready to use. You'll just need to connect them to your destinations.
Step 4: Install the Search Pack
The Search pack contains pre-built dashboards for forensic investigations.
- Switch to Cribl Search using the product nav
- Navigate to Packs in the left menu
- Click Import Pack at the top right
- Select Import from URL
- Enter this URL:
https://sandbox.cribl.io/assets/packs/datatieringpacksearch.crbl - Click Import
- Navigate to Packs in the left menu and your pack
datatieringpacksearchwill show
This pack includes:
- Interactive Forensic Dashboard - Search across datasets for specific IPs, domains, or terms, then send matches to investigation dataset
- Events to Lake Dashboard - Confirmation view for events being sent to Lake
- Pre-configured queries for common investigation patterns
You'll learn how to use these dashboards in Module 5.
Step 5: View || Configure Destinations
Now you need to tell the Stream pack where to send data for each tier.
Destination 1: Top Tier SIEM (Webhook Simulator)
- Switch back to Cribl Stream
- Navigate to Processing > Packs > sandboxdatatieringpack > Destinations
- Click Webhook tile
- Click Add Destination
- View the webhook:
- Output ID:
top_tier_siem - URL:
https://webhook.site/datatieringsandbox(see tip below) - Method:
POST
- Output ID:
Don't have a webhook URL? No problem:
- Visit webhook.site in a new browser tab
- You'll get a unique URL automatically
- Copy that URL and paste it into the destination config
- Keep the webhook.site tab open - you'll see events arrive in real-time
- or you can use ours
https://webhook.site/datatieringsandboxIn production, this would be your Splunk Software, Elastic endpoint, or other SIEM, APM etc.
Destination 2: Tier 2 Lakehouse
- Still in Processing > Packs > sandboxdatatieringpack > Destinations, search for Lakehouse
- Click Add Destination
- View:
- Output ID:
tier2-LH - Lakehouse: Select
tier2-LH(the one you created earlier)
- Output ID:
- Click Save
Destination 3: Tier 3 Lake
- Still in Data > Destinations, search for Cribl Lake
- Click Add Destination
- View:
- Output ID:
lake-default - Lake Dataset: Select
datatieringlogs
- Output ID:
- Click Save
Step 6: Connect the Pack to Your Destinations
The Stream pack's routes need to know which destinations you just created.
- Navigate to Routing > Data Routes
- You should see three routes from the pack:
tier1tier2tier3
- Click into tier1 route
- Update the Output field to:
webhook:top_tier_siem - Click Save
- Click into tier2 route
- Update the Output field to:
cribl_lake:tier2-LH - Click Save
- Click into tier3 route
- Update the Output field to:
cribl_lake:lake-default - Click Save
- Click Commit in the upper right
- Click Commit & Deploy
The pack can't know what you named your destinations, so you need to connect them manually. Think of it like plugging in cables - the routes (logic) are pre-built, you just need to connect them to your endpoints.
Step 7: View the Data Source
The pack includes a sample data generator. Let's turn it on.
- Navigate to Packs > sandboxdatatieringpack > sources
- Find
datagenin the list - Click Live Data in the Status column
- Verify you see Apache-style log events flowing
- Look for fields like
status,clientip,request,bytes - Click Close when done observing
Step 8: Connect LH to dataset
-
Switch to Cribl Lake in the product switcher
-
In the left navigation, click Datasets
-
Click datatieringlogs in the list
-
Configure the new dataset:
- Lakehouse:
tier2LH
- Lakehouse:
-
Click Save
Step 9: Verify Everything Works
Let's make sure data is flowing through all three tiers.
Check Tier 1 (SIEM)
- Open your webhook.site tab (from earlier)
- You should see POST requests arriving
- Look for events with
"business_impact":"critical"and"status":500+ - These are your critical server errors reaching the "SIEM"
If you don't see any events yet:
- Wait 1-2 minutes (data needs time to flow through)
- Check that the
sbx_weblogssource is enabled - Verify the tier1 route output is set correctly
Check Tier 2 (Lakehouse)
- Switch to Cribl Search
- In the query box, enter:
dataset="tier2_lh" | limit 10 - Click Search
- You should see events with
business_impactofsecurityoruser_experience - Look for status codes: 401, 403, 404
The Lakehouse dataset name might be tier2-LH or tier2_lh depending on how Lake handles hyphens. If the first query doesn't work, try:
dataset="datatieringlogs"
| where business_impact IN ("security", "user_experience")
| limit 10
Check Tier 3 (Cold Lake)
- Still in Cribl Search, run:
dataset="datatieringlogs"
| stats count by business_impact - You should see four categories:
criticalsecurityuser_experienceroutine
- This confirms all events are landing in cold storage
What You've Accomplished
At this point, you have:
- ✅ Installed Stream pack with pre-configured routing logic
- ✅ Installed Search pack with forensic investigation dashboards
- ✅ Viewed Lake dataset (
datatieringlogs) for storing all tiers - ✅ Created Lakehouse (
tier2-LH) for indexed Tier 2 queries - ✅ Connected three destinations representing your tiers:
- Webhook (SIEM) for critical events
- Lakehouse for security/UX investigations
- Lake for cold archive
- ✅ Verified data is flowing through all three tiers
Now you're ready to explore how it works and experiment with the system.
Next: Understanding Business Impact Tags →