Let's Talk About Syslog
Syslog has been a steadfast log management protocol since the dawn of the Internet, enabling seamless data transmission across operating systems, processes, and devices. As technology evolves, however, so do the challenges associated with handling syslog data effectively. The Cribl product suite marks a new era in syslog transformation, offering unparalleled capabilities for managing syslog data inbound and outbound.
A Few Problems to Solve
Risk of Data Loss
The original Syslog protocol was based on UDP, not TCP. In those days, when servers' memory was measured in kilobytes rather than gigabytes, UDP made sense. UDP allowed syslog senders to keep working just fine, even if the destination was down. Even today, some network devices support sending syslog only on UDP. But UDP delivery is potentially lossy – your data will probably get there, but it might not. Sending over the Internet makes it more likely that data will be lost.
Lack of Encryption Support
Syslog over UDP doesn't support encryption on the wire. TCP does allow syslog to send using TLS, but among syslog-sending devices, this is a pretty rare feature. Unencrypted messaging on a local area network might be okay in some situations, but it's not ideal.
Timezone Issues
While modern syslog senders support RFC 5424 to accurately represent timezone, year, and day, most devices sending syslog have a timestamp that looks like this: May 12 21:20:39
What time zone is that? What year? Does the sending system use UTC? Eastern daylight time? There's no way to tell.
Lack of Context
For systems of analysis such as Splunk and Elasticsearch, it's often necessary to set a sourcetype, index, or other field that tells the system how to process and store syslog data.
Adding this metadata after the fact is necessary since it is not included in the original payload. Typically, this involves configuring rsyslog or Syslog-NG configuration files in conjunction with Splunk Forwarders or Elastic Beats agents. There is a tricky part of managing syslog servers, since IPs and hostnames are placed in the syslog server's configuration files. The forwarding agent's index and sourcetype settings have to match these IP addresses and hostnames.
Redundant Data
A human-readable timestamp and an associated hostname are not needed when syslog data is sent to a destination like Splunk or Elasticsearch. Those fields are already delivered in the extracted timestamp (epoch time) and host fields. Having this data in the message field adds 25-30 bytes per event. While this might not sound like much, it can account for more than 20 percent of total data volume in some cases.
Syslog data sent with RFC 5424’s TimeQuality feature adds up to an additional 63 bytes per event. This is all data that is rarely searched, and the extracted timestamp is already in _time
, in epoch time format, with millisecond precision if available. These 63 bytes of data are just an added storage + indexing burden on the destination system.
How Cribl Helps
Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.
Reducing Management Complexity
Just like rsyslog and Syslog-NG, Cribl Stream and Cribl Edge are native syslog receivers. You may be able to retire or repurpose your existing syslog servers by replacing them with Stream Workers, which also support other protocols as well. In Stream, all of the issues mentioned above can be addressed without having to edit configuration files manually.
We understand that overhauling your current syslog deployment takes effort and time. Until then, Cribl Edge can be used as a stopgap to simplify current agent (log shipper) deployments. By using Cribl Edge, you can scrape your current files directly from your existing syslog servers. Why do this? Choice. Using Cribl Edge, you will be able to ship your syslog files not only to multiple storage destinations (read: cost-effective object storage and SIEM), but also to multiple vendors. You won't be locked into a single ecosystem anymore.
Minimizing the Possibility of Data Loss
Deploy Cribl as close as possible to the sender to avoid UDP data loss. This would be best handled either by placing a pair of appropriately sized Workers behind a load balancer on the same subnet or LAN, or by installing Cribl Edge on an already running syslog server. You can then configure your Destinations to use "Persistent Queuing" on backpressure, which will ensure that data will be buffered and then delivered when a Destination returns to service.
Securing the Connection
Cribl can deliver data via TCP, with TLS encryption, to any subsequent connection after receiving the data via syslog (encrypted or not). For the same reasons as with the UDP scenario, one would want to place the Cribl deployment on the same subnet, LAN, or server.
Cribl Pack for Syslog Input
The Cribl Pack for Syslog Input handles timezones, data cleanup, and enrichment. Using this Pack as a pre-processing Pipeline allows you to apply the benefits below to most or all inbound syslog data. This Pack's volume reduction can remove more than 100 bytes per event, by dropping redundant or unnecessary values.
The syslog Pack, available in the Cribl Pack Dispensary repo, does the following:
- Via lookup on the
host
field, identifies timezone information for specific senders, and uses this toset _time
. - Automatically sets time correctly when inbound traffic is off by precisely N hours. For example, if a syslog sender’s timestamp appears to be exactly 3 hours in the future – it fixes it!
- Provides metadata such as
sourcetype
andindex
via lookups. - Removes the human-readable timestamp from
_raw
, saving about 16-25 bytes. - Moves the
host
value to metadata, and removes it from_raw,
saving 5-25 bytes. - Handles the syslog fields for
Facility
,Severity
, andApp
to provide metadata. Optionally removed from_raw
. - Optionally, drops Debug-level events, based on
Severity
field. - Optionally, removes TimeQuality data from
_raw
, saving up to 63 bytes.
The Pack also includes data samples, to allow testing of the various options prior to putting them into production. Full documentation for the Pack is included within the Pack itself.
Summary
Deploy Cribl Stream Workers as replacements for existing syslog servers (rsyslog, Syslog-NG, etc.), and/or deploy Cribl Edge Nodes on existing syslog servers/senders. Either option reduces management complexity, and ensures reliable, secure delivery of syslog data to your chosen systems of analysis and systems of retention.
By using the Cribl Pack for Syslog, you can easily enrich your inbound syslog data with metadata, while simultaneously removing redundant data for a savings of 20% or more. And finally see an end to timestamp and timezone extraction issues.
If you're still not convinced, this blog series goes into more details on the challenges of syslog at scale, and how Cribl can help.
If you are convinced, then let's away on a journey of self syslog discovery!