Welcome to Data Tiering & Search Optimization
TL;DR
Learn how to build a cost-effective data architecture that enables forensic investigations without exploding your budget. You'll create a multi-tier storage strategy and master search techniques that help you find needles in haystacks—then send only the relevant data downstream.
What You'll Learn
By the end of this sandbox, you'll be able to:
- Search across massive datasets to find specific events quickly
- Tier data automatically based on business value and criticality
- Export only relevant events to expensive downstream systems
- Optimize search queries for speed and cost efficiency
- Tag events with metadata that drives intelligent routing decisions
The Challenge
Your security team just informed you that they need to investigate a potential breach that happened 4 months ago. They need to:
- Search through 500TB of historical data in Cribl Lake
- Find evidence of suspicious activity across multiple log sources
- Export relevant events to your SIEM for deep analysis
- Do all this without blowing your quarterly budget
Traditional approach: Export all 500TB to your SIEM. Cost: $$$K. Time: 3-5 days.
The Cribl way: Search Lake directly, find the needle, send only what matters. Cost: Less than $100. Time: Minutes.
The Solution: Smart Tiering
Not all data is created equal. Some events are critical and need immediate attention in your top-tier SIEM. Others are important for context but can live in cheaper storage. The rest? Archive for compliance.
Think of it like coffee beans at Cribl Coffee Co.:
- Tier 1 (Hot): Premium single-origin beans → expensive SIEM
- Tier 2 (Warm): Quality house blend → cost-effective Lakehouse
- Tier 3 (Cold): Bulk commodity → cheap long-term Lake storage
We'll teach you how to automatically route events to the right tier based on their business impact.
What You'll Learn
This isn't a theoretical exercise. You'll work with two real, pre-built systems:
1. Interactive Forensic Dashboard (Tips & Tricks)
We've built dashboards in a Search pack that let you:
- Search across all datasets for specific terms or IPs
- Filter results interactively
- Send only matching events to your data lake
- Perfect for incident response and investigations
You'll learn how to use these dashboards for real investigations and customize them for your needs.
2. Automated Data Tiering (The Architecture)
We've built routes in a Stream pack that automatically:
- Analyze incoming web logs to determine business impact
- Route critical events (5xx errors, auth failures) to Tier 1 SIEM
- Send security/UX events to Tier 2 Lakehouse
- Archive everything else to Tier 3 for compliance
- Reduce downstream costs by 70-90%
You'll learn how the tiering logic works, experiment with the routes, and adapt them to your data.
Prerequisites
You should have basic familiarity with:
- Cribl Stream concepts (Sources, Destinations, Routes, Pipelines)
- Cribl Lake and Cribl Search fundamentals
- Log analysis and filtering expressions
If you're brand new to Cribl Lake, we recommend checking out the Cribl Lake Overview Sandbox first.
Conventions Used in This Course
Throughout this course, we'll use the following formatting:
These blocks contain step-by-step instructions. Follow them carefully!
Optional context and additional information that helps you understand the "why" behind what you're doing.
Critical information that will save you from headaches later.
Advanced techniques and shortcuts for power users.
Course Philosophy
Search First, Send Second: Don't export everything and search later. Search first, export only what matters.
Tag Everything: Metadata drives intelligent routing. The more you tag, the smarter your system becomes.
Cost-Conscious Architecture: Every event has a cost. Route based on value, not volume.
Human-in-the-Loop: Automation handles the bulk work, but analysts make the critical decisions.
Time to Get Started
Ready to build a data architecture that's both powerful and cost-effective?
Let's dive in.
Next: Setting Up Your Environment →