Configuring a Data Collector

Let's configure our collector.

important

Click Cribl on the top row of tabs, then with Manage active, select Data in the Stream top nav and click Sources, then click on the S3 tile under the Collectors section.
On the Manage S3 Collectors page, click Add Collector
In the Collector ID field, enter pan-logs
In the S3 bucket field, enter logarchives
In the Region field, select US East (N. Virginia)
In the Path field, enter:
```
/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}/${sourcetype}/${src_zone}/${src_ip}/${dest_zone}/${dest_ip}/
```
(You can click the Copy link at the upper-right corner above to put this on your clipboard.)

This both mirrors the partitioning scheme from the Exploring Data At Rest portion of the course, and extracts the elements from the partitioning scheme into fields that we can use for filtering:
- sourcetype – the sourcetype of the data
- src_zone – the zone from which the source IP enters the firewall.
- src_ip – the IP address of the source system
- dest_zone – the zone through which the traffic egresses to get to the destination IP.
- dest_ip – the IP address of the destination system.
- _time is also extracted via this scheme, allowing us to filter by time.
Expand the Authentication section, switch the method from Auto to Manual, then enter these credentials in the Access key and Secret key fields:

Access key Secret key
ACCESSKEY SECRETKEY

Access key	Secret key
`ACCESSKEY`	`SECRETKEY`

At this point, your screen should look something like this: Collector Config Screen - Page 1

A note about the S3 setup: Since we're using a local MinIO instance, we're using the default credentials (ACCESSKEY/SECRETKEY). We'll also need to specify an API endpoint (you don't need to do this with native S3).

important

Under Collector Settings, expand the Optional Settings section.
In the Endpoint field, enter http://minio:9000/
Next, click the Event Breakers tab on the modal's left edge.
Click Add ruleset, and a new field will appear. In that Event Breaker rulesets field, pull down and select the Cribl ruleset.
Select the Result Routing left tab, and ensure that Send to Routes is set to No.
For now, we're going to use the passthru Pipeline, so in the Pipeline field, select passthru, and in the Destination field, select the elastic:elastic entry.
Leave everything else with default values, and click Save.

A couple things to mention here:

As data comes in from the collector, it's bundled into a collector event, placing the original event in the _raw field in a JSON object. But we want all the fields of the original event, so we need to do one of two things: use an Event Breaker that pulls the events apart properly, or use a Parser function in a Pipeline to extract the data. In this case, we're using the Event Breaker - the Cribl event-breaking rules handle the event extraction for us.
In our use case, we're doing an investigation on a single type of data. We don't need the full power of the routing system, so instead of sending the collection data through the Data Routes system, we're setting a specific Pipeline and Destination for all events from the collector to go through. If you're replaying all of your data from a specific time, you might want to utilize Data Routes to process all the data as it would had it streamed in. In our case, we don't need to.
It's also worth mentioning the Auto‑populate from field back in Collector Settings. In your own environment, if you have an S3 destination configured for your archive, you can click this and select that destination, filling in all of the configuration fields for that destination. This makes it easier to configure collection from your archive destination.