Parsing
In this section we're going to introduce a Parser
Function into our Pipeline. Parser
will parse our key=value
data into richer structured events, which can easily be consumed by schema-on-write systems.
First, let's go back to our business_event
Pipeline, and display our captured sample data.
important
Open business_event Pipeline
- With
Manage
active in Stream's top nav, select theProcessing
submenu and clickPipelines
. - Click
business_event
in thePipelines
column.
To display this column's header and contents, you might need to drag the pane and column dividers toward the right. - Click
Sample Data
in the right pane. - Click
Simple
next to thebe.log
capture.
Extract with Parser
We're going to add a Parser
Function that will look into the _raw
field and parse its contents in Key=Value
format. Our log is pretty gritty, with random commas and formatting weirdness, but the Key=Value
pairs are well structured.
important
Add Parser
- Click
+ Function
- Search for
Parser
, then click it. - In the new
Parser
Function, pastesourcetype=='business_event'
over theFilter
field's defaulttrue
entry. - Set the
Type
toKey=Value Pairs
. - Click
Save
.
We should see Preview instantly light up with a lot more fields. You should also see events streaming in the shell change their shape as well. Note, above we left the Function's Operation Mode
set to Extract
, and its Source Field
set to _raw
. Parser
is configured to Extract
the Key=Value
data from the _raw
field and place it in fields in the event.
This is rich structured data, with a lot of fields. Not all of these fields are particularly interesting, so we're going to use Parser
to ignore a few of them. First, we can see that a number of these fields are set to NA
– examples are credits
, EventConversationID
, JMSCorrelationID
, and ReplyTo
– so we're going to configure Parser
to ignore fields with a value of NA
.
important
Filter NA
Values
- In the Function's
Fields Filter Expression
field (near the bottom), paste in the following:value!=='NA'
- Click
Save
, and watch the right Preview pane.
Now we should see credits
, EventConversationID
, JMSCorrelationID
, and ReplyTo
disappear.
This is another use of JavaScript expressions. A Fields Filter Expression
is a JavaScript expression, with key
set to the field name and value
set to the field value, called for every field returned by the Parser
Function. With this expression, we can decide which fields should be returned by the Parser, by running an expression for each key
and value
, with globals key
and value
set to their representative values. If the expression evaluates true, then the field is kept. So for this expression, value!=='NA'
says keep the field for any value that isn't NA
.
There's a few more verbose fields we'd also like to get rid of.
important
Ignore Verbose Fields
- Paste the following into
Fields to Remove
:planDescription,properties,timestamp,timeToLive
- Click
Save
, and watch the Preview pane.
The fields planDescription
, properties
, timestamp
, and timeToLive
should no longer be extracted. Your Parser
Function should now look like this:
Reserialize with Parser
Having removed uninteresting fields, we now have a cleaner event. Let's also explore another important feature of Parser
: its Reserialize
mode.
important
Set Reserialize
- Change the
Parser
Function'sOperation Mode
toReserialize
. - Click
Save
, and watch the right Preview pane.
Instead of extracting the event fields, we're now rewriting _raw
with just the Key=Value
pairs we've saved. This technique can be used with structured events like JSON Object
, CSV
, or Key=Value
, to save significant amounts of data by trimming off fields we aren't interested in.
important
Set Extract
- Set
Operation Mode
back toExtract
- Click
Save
.
Fix Bad Extraction with Eval
Now that we've parsed out the raw event, we also see one field, networkProviderName
, where the Key=Value
Parser picked up a bit of junk. With gritty logs, sometimes techniques like Key=Value
parsing are imperfect. We're going to fix this field with the Eval
Function. Additionally, we no longer need the _raw
field – it contains all our information, duplicated – so we're going to remove it with Eval
as well.
important
Add Eval Function
- Click
+ Function
in the top nav, search forEval
, and click it. - Scroll down and click into the new
Eval
Function. - For
Filter
, again entersourcetype=='business_event'
. - Under
Evaluate Fields
, clickAdd Field
. - For
Name
, enternetworkProviderName
. - For
Value Expression
, paste the following:networkProviderName.replace(/}]$/, '')
- Under
Remove Fields
, enter_raw
. - Click
Save
.
This searches for }]
and replaces it with nothing. This could be accomplished with a Mask
Function as well. We're also removing the _raw
field, as it's no longer needed. Our event diff should look like this:
With this Pipeline, we're now redacting sensitive information with Mask
, and converting old Key=Value
data to rich structured logs with Parser
and Eval
. Next, we're going to use a Lookup
Function to add some additional context, and use that context for routing logic.