> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Grok Parser Stage

export const schema = {
  "type": "object",
  "title": "Grok",
  "description": "Parses semi structured content using Grok patterns (like Regex, see https://github.com/thekrakken/java-grok).  This is often ideal for understanding log files, but can be used for other purposes.",
  "required": ["charset", "ignoreBOM", "type"],
  "properties": {
    "id": {
      "type": "string",
      "title": "Parser ID",
      "default": "a0defe60-37f0-4091-8b22-90bf802fefc0"
    },
    "label": {
      "type": "string",
      "title": "Label",
      "description": "A label for this Parser Stage",
      "maxLength": 255
    },
    "enabled": {
      "type": "boolean",
      "title": "Enable this Parser Stage",
      "default": true
    },
    "mediaTypes": {
      "type": "array",
      "title": "Media Types to match",
      "description": "Documents with a media type on this list will be matched by this parser stage. See inheritMediaTypes / use default media types for more.",
      "items": {
        "type": "string",
        "pattern": "^[^\\/]+\\/[^\\/]+$",
        "format": "rfc2646"
      }
    },
    "inheritMediaTypes": {
      "type": "boolean",
      "title": "Match default media types in this Parser Stage",
      "description": "Each parser stage has a built-in list of media types it handles by default. If this setting is true, that list will be used along with any optional additional types provided in the mediaTypes list. If this setting is false, this stage will only be selected for media types in the mediaTypes list, and the mediaTypes list becomes a mandatory property which must have at least one valid media type.",
      "default": true
    },
    "ignoredMediaTypes": {
      "type": "array",
      "title": "Media Types to ignore",
      "description": "Documents with a media type on this list will be not be processed by this parser stage.",
      "items": {
        "type": "string",
        "pattern": "^[^\\/]+\\/[^\\/]+$",
        "format": "rfc2646"
      }
    },
    "pathPatterns": {
      "type": "array",
      "title": "File names to parse",
      "description": "Specify a file name or pattern that must be matched for this parser stage to run. Forward slashes (\"/\") are used to join names of files inside archives with the archive name.",
      "items": {
        "type": "object",
        "properties": {
          "syntax": {
            "type": "string",
            "title": "Pattern type",
            "description": "glob uses bash shell-style wildcards; regex uses Java (PCRE-style) regex",
            "enum": ["glob", "regex"],
            "default": "glob"
          },
          "pattern": {
            "type": "string",
            "title": "File name or pattern",
            "description": "e.g.: \"z.txt\" or \"*.md\" or \"/a/*/b/f.txt\" for glob; \"z.txt$\" or \".*\\.txt$\" or \"^/a/[^\\/]*/b/f.txt$\" for regex"
          }
        }
      }
    },
    "errorHandling": {
      "type": "string",
      "title": "Error Handling",
      "enum": ["ignore", "log", "fail", "mark"],
      "default": "mark"
    },
    "outputFieldPrefix": {
      "type": "string",
      "title": "Prefix parsed fields with",
      "description": "Fields extracted by this parser will be prefixed with this string. The remainder of the field name will be as detected in the stream",
      "maxLength": 20,
      "pattern": "^$|^[A-Za-z_][A-Za-z0-9_\\-\\.]+$"
    },
    "charset": {
      "type": "string",
      "title": "Character Set",
      "description": "Example: \"UTF-8\"",
      "default": "detect"
    },
    "ignoreBOM": {
      "type": "boolean",
      "title": "Ignore BOM",
      "description": "Ignore Byte-Order Mark (BOM) if present and always use the configured character set. When set to false a valid BOM character set overrides the configured default character set.",
      "default": false
    },
    "grokDefinition": {
      "type": "string",
      "title": "Grok Definition",
      "description": "Custom Grok definition",
      "hints": ["code/javascript"]
    },
    "grokPattern": {
      "type": "string",
      "title": "Grok Pattern",
      "description": "Grok parsing pattern",
      "hints": ["code/javascript"]
    },
    "type": {
      "type": "string",
      "enum": ["grok"],
      "default": "grok"
    }
  },
  "additionalProperties": false,
  "category": "Other",
  "categoryPriority": 1,
  "unsafe": false
};

export const SchemaParamFields = ({schema}) => {
  const sanitize = str => {
    if (typeof str !== "string") return str;
    return str.replace(/^"(.*)"$/s, "$1").replace(/\\/g, "").replace(/"/g, "'");
  };
  const formatDescription = str => {
    const s = sanitize(str);
    return (/[.!?]\)*$/).test(s) ? s : `${s}.`;
  };
  const {description, properties = {}, required: requiredProps = []} = schema;
  const visibleProps = useMemo(() => Object.entries(properties).filter(([, prop]) => !prop.hints?.includes("hidden")), [properties]);
  return <div>
      {description && <p>{formatDescription(description)}</p>}

      {visibleProps.map(([name, prop]) => {
    const isRequired = requiredProps.includes(name);
    const hasDefault = prop.default !== undefined;
    const rawDefault = prop.default;
    const isComplexDefault = hasDefault && (typeof rawDefault === "object" || typeof rawDefault === "string" && (rawDefault.length > 20 || rawDefault.includes('"')));
    const fieldProps = {
      key: name,
      body: prop.title || name,
      type: prop.type,
      ...prop.title && ({
        post: [<><span className="text-stone-400 dark:text-stone-500">API property: </span>{name}</>]
      }),
      ...isRequired && ({
        required: true
      }),
      ...!isComplexDefault && hasDefault ? {
        default: sanitize(String(rawDefault))
      } : {}
    };
    const isObject = prop.type === "object" && prop.properties;
    const isArrayOfObjects = prop.type === "array" && prop.items?.type === "object" && prop.items.properties;
    return <ParamField {...fieldProps}>
            {prop.description && <p>{formatDescription(prop.description)}</p>}

            {isComplexDefault && <div className="flex">
                <p>
                  <strong>Default:</strong>
                </p>
                <pre className="!my-0">
                  <code>
                    {JSON.stringify(rawDefault, null, 2)}
                  </code>
                </pre>
              </div>}

            {isArrayOfObjects && <div className="flex">
              <p>
                <strong>Object attributes:</strong>
              </p>
              <pre className="!my-0">
                <code>
                  {'{\n'}
                  {Object.entries(prop.items.properties).map(([iname, iprop]) => <>
                      {`  ${iname}`}
                      {prop.items?.required?.includes(iname) && <span style={{
      color: 'red'
    }}> required</span>}
                      {`: {\n    display name: ${sanitize(iprop.title || '')}\n    type: ${iprop.type}\n  }\n`}
                    </>)}
                  {'}'}
                </code>
              </pre>
              </div>}

            {isObject && <Expandable title="properties">
                <SchemaParamFields schema={{
      properties: prop.properties,
      required: prop.required
    }} />
              </Expandable>}
          </ParamField>;
  })}
    </div>;
};

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/5/fusion/reference/config-ref/parser-stages/grok-parser

[mintlify link]: https://doc.lucidworks.com/docs/5/fusion/reference/config-ref/parser-stages/grok-parser

[old doc.lw link]: https://doc.lucidworks.com/fusion/5.9/373

The Grok parser stage uses [Java Grok](https://github.com/thekrakken/java-grok) and Grok patterns (a specific kind of regex matching) to parse log files and similar text files that have line-oriented, semi-structured data. Parsing a text file with the Grok parser lets you give more structure to semi-structured data and extract more information.

<LwTemplate />

## Whether the Grok stage parses a file

Before a Grok parser stage parses a file, the file must meet criteria regarding the media type and file name.

### Media type

The Grok parser stage parses files that have media types that match either the default media types *or* media types that you specify.

Select or unselect **Use default media types for this parser stage**:

* Selected.\* The Grok parser stage parses files that have one of the default media types (`text/plain` or `text/x-log`), as well as files that have media types that you specify under **Media Types for this Parser Stage**.
* Unselected.\* The Grok parser stage *only* parses files that have one of the media types that you specify under **Media Types for this Parser Stage**.

### File name

Optionally, you can specify a file name or file name pattern that a file must match for the Grok parser stage to parse the file.

| Field                | Description                                                                                                                                                                                                                                     |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Pattern Type         | `glob`. Use `bash` shell wildcards. Examples include `z.txt`, `*.md`, and `/a/*/b/f.txt`.  `regex`. Use Java regular expressions (PCRE; Perl-compatible regular expressions). Examples include `z.txt$`, `.*\.txt$`, and `^/a/[^\/]*/b/f.txt$`. |
| File Name or Pattern | Name of the file or a pattern for the file name. The parser parses matching files.                                                                                                                                                              |

## Grok patterns

Grok patterns are regular expressions written in the language of the [Oniguruma regular expression library](https://github.com/kkos/oniguruma), which has [this syntax](https://github.com/kkos/oniguruma/blob/master/doc/RE).

You configure a Grok parsing stage to use predefined [Grok patterns](/docs/5/fusion/reference/config-ref/parser-stages/grok-patterns) (about 300 patterns are available) and/or Grok pattern definitions that you write yourself.

* **Use predefined patterns.** Under the **Grok Pattern** part of the Grok parser stage configuration, specify a single top-level Grok pattern by name, for example, `REDISLOG`.
* **Write your own Grok pattern definition(s).** (*optional*) Write one or more Grok pattern definitions, and then enter them in the **Grok Definition** part of the Grok parser stage configuration.

## Parsing rules

These are rules that affect the results of parsing:

* **Precedence in the event of identical names.** If the name of a custom Grok pattern definition that you provide is identical to the name of a predefined pattern definition, then your definition is used.

* **Invalid patterns.** If a pattern is not syntactically valid, then the full text of the row being parsed is treated as a single field.

* **Pattern does not match any data.** If a pattern does not match any data, then the full text of the row being parsed is treated as a single field.

* **Line by line.** Parsing is line by line. If data has a multiline structure, the parser does not capture the relationship between lines.

<Tip>
  When entering configuration values in the UI, use *unescaped* characters, such as `\t` for the tab character. When entering configuration values in the API, use *escaped* characters, such as `\\t` for the tab character.
</Tip>

<SchemaParamFields schema={schema} />
