Grok Parser Stage

The Grok parser stage uses Java Grok and Grok patterns (a specific kind of regex matching) to parse log files and similar text files that have line-oriented, semi-structured data. Parsing a text file with the Grok parser lets you give more structure to semi-structured data and extract more information.

Whether the Grok stage parses a file

Before a Grok parser stage parses a file, the file must meet criteria regarding the media type and file name.

Media type

The Grok parser stage parses files that have media types that match either the default media types or media types that you specify.

Select or unselect Use default media types for this parser stage:

  • Selected – The Grok parser stage parses files that have one of the default media types (text/plain or text/x-log), as well as files that have media types that you specify under Media Types for this Parser Stage.

  • Unselected – The Grok parser stage only parses files that have one of the media types that you specify under Media Types for this Parser Stage.

File name

Optionally, you can specify a file name or file name pattern that a file must match for the Grok parser stage to parse the file.

Field Description

Pattern Type

glob – Use bash shell wildcards. Examples include z.txt, *.md, and /a/*/b/f.txt.

regex – Use Java regular expressions (PCRE; Perl-compatible regular expressions). Examples include z.txt$, .*\.txt$, and ^/a/[^\/]*/b/f.txt$.

File Name or Pattern

Name of the file or a pattern for the file name. The parser parses matching files.

Grok patterns

Grok patterns are regular expressions written in the language of the Oniguruma regular expression library, which has this syntax.

You configure a Grok parsing stage to use predefined Grok patterns (about 300 patterns are available) and/or Grok pattern definitions that you write yourself.

  • Use predefined patterns – Under the Grok Pattern part of the Grok parser stage configuration, specify a single top-level Grok pattern by name, for example, REDISLOG.

  • Write your own Grok pattern definition(s) (optional) – Write one or more Grok pattern definitions, and then enter them in the Grok Definition part of the Grok parser stage configuration.

Parsing rules

These are rules that affect the results of parsing:

  • Precedence in the event of identical names – If the name of a custom Grok pattern definition that you provide is identical to the name of a predefined pattern definition, then your definition is used.

  • Invalid patterns – If a pattern isn’t syntactically valid, then the full text of the row being parsed is treated as a single field.

  • Pattern doesn’t match any data – If a pattern doesn’t match any data, then the full text of the row being parsed is treated as a single field.

  • Line by line – Parsing is line by line. If data has a multiline structure, the parser doesn’t capture the relationship between lines.

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

Grok stage-specific properties

Property Description, Type
charset

Character Set

required

Example: "UTF-8"

type: string

default value: 'detect'

enabled

Enable this Parser Stage

type: boolean

default value: 'true'

errorHandling

Error Handling

type: string

default value: 'mark'

enum: { ignore log fail mark }

grokDefinition

Grok Definition

Custom Grok definition

type: string

grokPattern

Grok Pattern

Grok parsing pattern

type: string

id

Parser ID

type: string

default value: '199b089e-08dd-4c6f-bc76-03a334604c9c'

ignoreBOM

Ignore BOM

required

Ignore Byte-Order Mark (BOM) if present and always use the configured character set. When set to false a valid BOM character set overrides the configured default character set.

type: boolean

default value: 'false'

inheritMediaTypes

use default media types for this Parser Stage

Indicates if parser stage should use the default media types. Unchecking this box means that ONLY the manually configured media types will be parsed by the parser and you then MUST provide at least one media type.

type: boolean

default value: 'true'

mediaTypes

Media Types for this Parser Stage

type: array of string

outputFieldPrefix

Prefix parsed fields with

Fields extracted by this parser will be prefixed with this string. The remainder of the field name will be as detected in the stream

type: string

maxLength: 20

pattern: ^$|^[A-Za-z_][A-Za-z0-9_\-\.]+$

pathPatterns

File names to parse

Specify a file name or pattern that must be matched for this parser stage to run. Forward slashes ("/") are used to join names of files inside archives with the archive name.

type: array of object

object attributes: {
  pattern : {
    display name: File name or pattern
    type: string
    description : e.g.: "z.txt" or "*.md" or "/a/*/b/f.txt" for glob; "z.txt$" or ".*\.txt$" or "^/a/[^\/]*/b/f.txt$" for regex
    }
  syntax : {
    display name: Pattern type
    type: string
    default value: 'glob'
    description : glob uses bash shell-style wildcards; regex uses Java (PCRE-style) regex
    enum: { glob regex     }

    }
  }
type

required

type: string

default value: 'grok'

enum: { grok }