The Plain Text parser can split a text file by lines or consume it into a single document.
Options for treatment of this filetype include:
-
Plain Text Parser Fields
-
Number of header rows to skip
-
Split on line end or not
-
Comment character
-
Skip empty lines
-
Charset
Tip
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
Text stage-specific properties
Property | Description, Type |
---|---|
charset
Character Set required |
Example: "UTF-8" type: default value: ' |
comment
Comment character |
Characters at start of line to indicate a comment, default # (hash) type: default value: ' minLength: 1 |
commentField
Comment field |
Name of the output field where comment is stored, default 'comment' type: default value: ' minLength: 1 |
commentHandling
Comment Handling |
How to handle comments: include as-is, ignore (and remove from text), add as field (and remove from text), default include type: default value: ' enum: { ignore include as_field } |
enabled
Enable this Parser Stage |
type: default value: ' |
errorHandling
Error Handling |
type: default value: ' enum: { ignore log fail mark } |
id
Parser ID |
type: default value: ' |
ignoreBOM
Ignore BOM required |
Ignore Byte-Order Mark (BOM) if present and always use the configured character set. When set to false a valid BOM character set overrides the configured default character set. type: default value: ' |
inheritMediaTypes
use default media types for this Parser Stage |
Indicates if parser stage should use the default media types. Unchecking this box means that ONLY the manually configured media types will be parsed by the parser and you then MUST provide at least one media type. type: default value: ' |
maxLength
Maximum length |
Maximum number of characters to allow for the body, -1 for unlimited, default 1MB type: default value: ' exclusiveMaximum: false exclusiveMinimum: false maximum: 2147483647 minimum: 0 |
maxLineLength
Maximum line length |
Maximum number of characters to allow for any single line, default 1MB type: default value: ' exclusiveMaximum: false exclusiveMinimum: false maximum: 2147483647 minimum: 0 |
mediaTypes
Media Types for this Parser Stage |
type: |
outputField
Output field |
Name of the output field where text is stored, default 'body' type: default value: ' minLength: 1 |
outputFieldPrefix
Prefix parsed fields with |
Fields extracted by this parser will be prefixed with this string. The remainder of the field name will be as detected in the stream type: maxLength: 20 pattern: ^$|^[A-Za-z_][A-Za-z0-9_\-\.]+$ |
pathPatterns
File names to parse |
Specify a file name or pattern that must be matched for this parser stage to run. Forward slashes ("/") are used to join names of files inside archives with the archive name. type: object attributes: { } |
skipEmptyLines
Skip empty lines |
Skip any empty lines encountered, default false type: default value: ' |
skipHeaderLines
Skip header lines |
Skip a number of header lines, default 0 type: default value: ' |
splitLines
Split lines |
Split text into lines to create multiple records, default false type: default value: ' |
trimWhitespace
Trim whitespace |
Trim off leading and trailing whitespace from lines, default false type: default value: ' |
type
required |
type: default value: ' enum: { text } |