CSV Parser Stage

This parser breaks down incoming CSV files into the most efficient components for Fusion to index. It produces one new document per row from the CSV input, excluding comment rows and header rows.

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

Parse CSV content

id - string

Default: 03114c11-0946-4e60-bd45-bac26f75479b

label - string

A label for this Parser Stage

<= 255 characters

enabled - boolean

Default: true

mediaTypes - array[string]

Documents with a media type on this list will be matched by this parser stage. See inheritMediaTypes / use default media types for more.

inheritMediaTypes - boolean

Each parser stage has a built-in list of media types it handles by default. If this setting is true, that list will be used along with any optional additional types provided in the mediaTypes list. If this setting is false, this stage will only be selected for media types in the mediaTypes list, and the mediaTypes list becomes a mandatory property which must have at least one valid media type.

Default: true

pathPatterns - array[object]

Specify a file name or pattern that must be matched for this parser stage to run. Forward slashes ("/") are used to join names of files inside archives with the archive name.

object attributes:{syntax : {
display name: Pattern type
type: string
}pattern : {
display name: File name or pattern
type: string
}}

errorHandling - string

Default: mark

Allowed values: ignorelogfailmark

outputFieldPrefix - string

Fields extracted by this parser will be prefixed with this string. The remainder of the field name will be as detected in the stream

<= 20 characters

Match pattern: ^$|^[A-Za-z_][A-Za-z0-9_\-\.]+$

charset - stringrequired

Example: "UTF-8"

Default: detect

ignoreBOM - booleanrequired

Ignore Byte-Order Mark (BOM) if present and always use the configured character set. When set to false a valid BOM character set overrides the configured default character set.

Default: false

delimiter - string

Delimiter character between fields. Any single character, including an escaped character, is valid, e.g. , (comma), \t (tab), or | (pipe). Default is comma if auto-detection is disabled

>= 1 characters

quote - string

Quote character, default is a double quote (") if auto-detection is disabled

<= 1 characters

quoteEscape - string

Quote escape character, default is a double quote (") if auto-detection is disabled

<= 1 characters

autoDetect - boolean

Attempt to guess the delimiter, quote, quote escape, and comment characters

Default: true

trimWhitespace - boolean

Trim off leading and trailing whitespace from columns, default true

Default: true

hasHeaders - boolean

Treat the first row as column headers, default true

Default: true

headers - array[string]

List of column headers, overrides file headers if present

skipEmptyLines - boolean

Skip any empty lines encountered, default true

Default: true

lineSeparator - string

Line separator character

>= 1 characters

nullValue - string

A string value to replace nulls with, no default

emptyValue - string

A string value to replace empty strings with, no default

includeRowNumber - boolean

Include the row number (line number) in the emitted documents, default true

Default: true

comment - string

Character at start of row to indicate a comment, default is hash (#) if auto-detection is disabled

<= 1 characters

commentHandling - string

How to handle comments: ignore, add as field to next document, or add a separate documents, default ignore

Default: ignore

Allowed values: ignoreas_fieldas_document

maxRowLength - integer

Maximum number of characters to allow for a single read line, default 10MB

<= 2147483647

exclusiveMinimum: false

exclusiveMaximum: false

Default: 10485760

maxNumColumns - integer

Maximum number of columns to allow for a single row, default 1000

<= 2147483647

exclusiveMinimum: false

exclusiveMaximum: false

Default: 1000

maxColumnChars - integer

Maximum number of characters a single column value can have, default 10MB

<= 2147483647

exclusiveMinimum: false

exclusiveMaximum: false

Default: 10485760

columnHandling - string

What to do when a row has too many or too few columns: Can throw an error, align the column, or do nothing special (default)

Default: default

Allowed values: erroraligndefault

fillValue - string

A string value to use when aligning the columns (when Column Mismatch Handling is "align")

Default: <FILL>

type - stringrequired

Default: csv

Allowed values: csv