Skip to main content

Let's Talk About Syslog

Syslog has been a steadfast log management protocol since the dawn of the Internet, enabling seamless data transmission across operating systems, processes, and devices. As technology evolves, however, so do the challenges associated with handling syslog data effectively. The Cribl product suite marks a new era in syslog transformation, offering unparalleled capabilities for managing syslog data inbound and outbound.

A Few Problems to Solve

Risk of Data Loss

The original Syslog protocol was based on UDP, not TCP. In those days, when servers' memory was measured in kilobytes rather than gigabytes, UDP made sense. UDP allowed syslog senders to keep working just fine, even if the destination was down. Even today, some network devices support sending syslog only on UDP. But UDP delivery is potentially lossy – your data will probably get there, but it might not. Sending over the Internet makes it more likely that data will be lost.

Lack of Encryption Support

Syslog over UDP doesn't support encryption on the wire. TCP does allow syslog to send using TLS, but among syslog-sending devices, this is a pretty rare feature. Unencrypted messaging on a local area network might be okay in some situations, but it's not ideal.

Timezone Issues

While modern syslog senders support RFC 5424 to accurately represent timezone, year, and day, most devices sending syslog have a timestamp that looks like this: May 12 21:20:39

What time zone is that? What year? Does the sending system use UTC? Eastern daylight time? There's no way to tell.

Lack of Context

For systems of analysis such as Splunk and Elasticsearch, it's often necessary to set a sourcetype, index, or other field that tells the system how to process and store syslog data.

Adding this metadata after the fact is necessary since it is not included in the original payload. Typically, this involves configuring rsyslog or Syslog-NG configuration files in conjunction with Splunk Forwarders or Elastic Beats agents. There is a tricky part of managing syslog servers, since IPs and hostnames are placed in the syslog server's configuration files. The forwarding agent's index and sourcetype settings have to match these IP addresses and hostnames.

Redundant Data

A human-readable timestamp and an associated hostname are not needed when syslog data is sent to a destination like Splunk or Elasticsearch. Those fields are already delivered in the extracted timestamp (epoch time) and host fields. Having this data in the message field adds 25-30 bytes per event. While this might not sound like much, it can account for more than 20 percent of total data volume in some cases.

Syslog data sent with RFC 5424’s TimeQuality feature adds up to an additional 63 bytes per event. This is all data that is rarely searched, and the extracted timestamp is already in _time, in epoch time format, with millisecond precision if available. These 63 bytes of data are just an added storage + indexing burden on the destination system.

How Cribl Helps

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

Reducing Management Complexity

Just like rsyslog and Syslog-NG, Cribl Stream and Cribl Edge are native syslog receivers. You may be able to retire or repurpose your existing syslog servers by replacing them with Stream Workers, which also support other protocols as well. In Stream, all of the issues mentioned above can be addressed without having to edit configuration files manually.

We understand that overhauling your current syslog deployment takes effort and time. Until then, Cribl Edge can be used as a stopgap to simplify current agent (log shipper) deployments. By using Cribl Edge, you can scrape your current files directly from your existing syslog servers. Why do this? Choice. Using Cribl Edge, you will be able to ship your syslog files not only to multiple storage destinations (read: cost-effective object storage and SIEM), but also to multiple vendors. You won't be locked into a single ecosystem anymore.

Minimizing the Possibility of Data Loss

Deploy Cribl as close as possible to the sender to avoid UDP data loss. This would be best handled either by placing a pair of appropriately sized Workers behind a load balancer on the same subnet or LAN, or by installing Cribl Edge on an already running syslog server. You can then configure your Destinations to use "Persistent Queuing" on backpressure, which will ensure that data will be buffered and then delivered when a Destination returns to service.

Securing the Connection

Cribl can deliver data via TCP, with TLS encryption, to any subsequent connection after receiving the data via syslog (encrypted or not). For the same reasons as with the UDP scenario, one would want to place the Cribl deployment on the same subnet, LAN, or server.

Cribl Pack for Syslog Input

The Cribl Pack for Syslog Input handles timezones, data cleanup, and enrichment. Using this Pack as a pre-processing Pipeline allows you to apply the benefits below to most or all inbound syslog data. This Pack's volume reduction can remove more than 100 bytes per event, by dropping redundant or unnecessary values.

The syslog Pack, available in the Cribl Pack Dispensary repo, does the following:

  • Via lookup on the host field, identifies timezone information for specific senders, and uses this to set _time.
  • Automatically sets time correctly when inbound traffic is off by precisely N hours. For example, if a syslog sender’s timestamp appears to be exactly 3 hours in the future – it fixes it!
  • Provides metadata such as sourcetype and index via lookups.
  • Removes the human-readable timestamp from _raw, saving about 16-25 bytes.
  • Moves the host value to metadata, and removes it from _raw, saving 5-25 bytes.
  • Handles the syslog fields for Facility, Severity, and App to provide metadata. Optionally removed from _raw.
  • Optionally, drops Debug-level events, based on Severity field.
  • Optionally, removes TimeQuality data from _raw, saving up to 63 bytes.

The Pack also includes data samples, to allow testing of the various options prior to putting them into production. Full documentation for the Pack is included within the Pack itself.

Summary

Deploy Cribl Stream Workers as replacements for existing syslog servers (rsyslog, Syslog-NG, etc.), and/or deploy Cribl Edge Nodes on existing syslog servers/senders. Either option reduces management complexity, and ensures reliable, secure delivery of syslog data to your chosen systems of analysis and systems of retention.

By using the Cribl Pack for Syslog, you can easily enrich your inbound syslog data with metadata, while simultaneously removing redundant data for a savings of 20% or more. And finally see an end to timestamp and timezone extraction issues.

If you're still not convinced, this blog series goes into more details on the challenges of syslog at scale, and how Cribl can help.

If you are convinced, then let's away on a journey of self syslog discovery!