Skip to main content

Parsing

In this section we're going to introduce a Parser Function into our Pipeline. Parser will parse our key=value data into richer structured events, which can easily be consumed by schema-on-write systems.

First, let's go back to our business_event Pipeline, and display our captured sample data.

Open business_event Pipeline
  1. With Manage active in Stream's top nav, select the Processing submenu and click Pipelines.
  2. Click business_event in the Pipelines column.
    To display this column's header and contents, you might need to drag the pane and column dividers toward the right.
  3. Click Sample Data in the right pane.
  4. Click Simple next to the be.log capture.

Extract with Parser

We're going to add a Parser Function that will look into the _raw field and parse its contents in Key=Value format. Our log is pretty gritty, with random commas and formatting weirdness, but the Key=Value pairs are well-structured.

Add Parser
  1. Click Add Function
  2. Search for Parser, then click it.
  3. In the new Parser Function, replace the default Filter expression with the following:
    sourcetype=='business_event'
  4. Set the Type to Key=Value Pairs.
  5. Click Save.

We should see Preview instantly light up with a lot more fields. You should also see events streaming in the shell change their shape as well. Note, above we left the Function's Operation Mode set to Extract, and its Source Field set to _raw. Parser is configured to Extract the Key=Value data from the _raw field and place it in fields in the event.

This is rich structured data, with a lot of fields. Not all of these fields are particularly interesting, so we're going to use Parser to ignore a few of them. First, we can see that a number of these fields are set to NA – examples are credits, EventConversationID, JMSCorrelationID, and ReplyTo – so we're going to configure Parser to ignore fields with a value of NA.

Filter NA Values
  1. In the Function's Fields Filter Expression field (near the bottom), paste in the following:
    value!=='NA'
  2. Click Save, and watch the right Preview pane.

Now we should see credits, EventConversationID, JMSCorrelationID, and ReplyTo disappear.

This is another use of JavaScript expressions. A Fields Filter Expression is a JavaScript expression, with key set to the field name and value set to the field value, called for every field returned by the Parser Function. With this expression, we can decide which fields should be returned by the Parser, by running an expression for each key and value, with globals key and value set to their representative values. If the expression evaluates true, then the field is kept. So for this expression, value!=='NA' says keep the field for any value that isn't NA.

There's a few more verbose fields we'd also like to get rid of.

important

Ignore Verbose Fields

  1. Paste the following into Fields to Remove:
    planDescription,properties,timestamp,timeToLive
  2. Click Save, and watch the Preview pane.

The fields planDescription, properties, timestamp, and timeToLive should no longer be extracted. Your Parser Function should now look like this:

Parser

Reserialize with Parser

Having removed uninteresting fields, we now have a cleaner event. Let's also explore another important feature of Parser: its Reserialize mode.

Set Reserialize
  1. Change the Parser Function's Operation Mode to Reserialize.
  2. Click Save, and watch the right Preview pane.

Instead of extracting the event fields, we're now rewriting _raw with just the Key=Value pairs we've saved. This technique can be used with structured events like JSON Object, CSV, or Key=Value, to save significant amounts of data by trimming off fields we aren't interested in.

Set Extract
  1. Set Operation Mode back to Extract
  2. Click Save.

Fix Bad Extraction with Eval

Now that we've parsed out the raw event, we also see one field, networkProviderName, where the Key=Value Parser picked up a bit of junk. With gritty logs, sometimes techniques like Key=Value parsing are imperfect. We're going to fix this field with the Eval Function. Additionally, we no longer need the _raw field – it contains all our information, duplicated – so we're going to remove it with Eval as well.

Add Eval Function
  1. Click Add Function in the top nav, search for Eval, and click it.
  2. Scroll down and click into the new Eval Function.
  3. For Filter, again enter:
    sourcetype=='business_event'
  4. Under Evaluate Fields, click Add Field.
  5. For Name, enter networkProviderName.
  6. For Value Expression, paste the following:
    networkProviderName.replace(/}]$/, '')
  7. Under Remove Fields, enter _raw.
  8. Click Save.

This searches for }] and replaces it with nothing. This could be accomplished with a Mask Function as well. We're also removing the _raw field, as it's no longer needed. Our event diff should look like this:

Diff

With this Pipeline, we're now redacting sensitive information with Mask, and converting old Key=Value data to rich structured logs with Parser and Eval. Next, we're going to use a Lookup Function to add some additional context, and use that context for routing logic.