This parser breaks down incoming CSV files into the most efficient components for Fusion to index. It produces one new document per row from the CSV input, excluding comment rows and header rows.
Tip
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
CSV stage-specific properties
Property | Description, Type |
---|---|
autoDetect
Auto-detect CSV Format |
Attempt to guess the delimiter, quote, quote escape, and comment characters type: default value: ' |
charset
Character Set required |
Example: "UTF-8" type: default value: ' |
columnHandling
Column mismatch handling |
What to do when a row has too many or too few columns: Can throw an error, align the column, or do nothing special (default) type: default value: ' enum: { error align default } |
comment
Comment character |
Character at start of row to indicate a comment, default is hash (#) if auto-detection is disabled type: maxLength: 1 |
commentHandling
Comment Handling |
How to handle comments: ignore, add as field to next document, or add a separate documents, default ignore type: default value: ' enum: { ignore as_field as_document } |
delimiter
Delimiter |
Delimiter character between fields. Any single character, including an escaped character, is valid, e.g. , (comma), \t (tab), or | (pipe). Default is comma if auto-detection is disabled type: minLength: 1 |
emptyValue
Empty string replacement |
A string value to replace empty strings with, no default type: |
enabled
Enable this Parser Stage |
type: default value: ' |
errorHandling
Error Handling |
type: default value: ' enum: { ignore log fail mark } |
fillValue
Column fill value |
A string value to use when aligning the columns (when Column Mismatch Handling is "align") type: default value: ' |
hasHeaders
Headers in file |
Treat the first row as column headers, default true type: default value: ' |
headers
Header list |
List of column headers, overrides file headers if present type: |
id
Parser ID |
type: default value: ' |
ignoreBOM
Ignore BOM required |
Ignore Byte-Order Mark (BOM) if present and always use the configured character set. When set to false a valid BOM character set overrides the configured default character set. type: default value: ' |
includeRowNumber
Include row number |
Include the row number (line number) in the emitted documents, default true type: default value: ' |
inheritMediaTypes
Match default media types in this Parser Stage |
Each parser stage has a built-in list of media types it handles by default. If this setting is true, that list will be used along with any optional additional types provided in the mediaTypes list. If this setting is false, this stage will only be selected for media types in the mediaTypes list, and the mediaTypes list becomes a mandatory property which must have at least one valid media type. type: default value: ' |
lineSeparator
Line Separator |
Line separator character type: minLength: 1 |
maxColumnChars
Maximum number or characters per column |
Maximum number of characters a single column value can have, default 10MB type: default value: ' exclusiveMaximum: false exclusiveMinimum: false maximum: 2147483647 minimum: 0 |
maxNumColumns
Maximum number of columns |
Maximum number of columns to allow for a single row, default 1000 type: default value: ' exclusiveMaximum: false exclusiveMinimum: false maximum: 2147483647 minimum: 0 |
maxRowLength
Maximum line length |
Maximum number of characters to allow for a single read line, default 10MB type: default value: ' exclusiveMaximum: false exclusiveMinimum: false maximum: 2147483647 minimum: 0 |
mediaTypes
Media Types to match |
Documents with a media type on this list will be matched by this parser stage. See inheritMediaTypes / use default media types for more. type: |
nullValue
Null value |
A string value to replace nulls with, no default type: |
outputFieldPrefix
Prefix parsed fields with |
Fields extracted by this parser will be prefixed with this string. The remainder of the field name will be as detected in the stream type: maxLength: 20 pattern: ^$|^[A-Za-z_][A-Za-z0-9_\-\.]+$ |
pathPatterns
File names to parse |
Specify a file name or pattern that must be matched for this parser stage to run. Forward slashes ("/") are used to join names of files inside archives with the archive name. type: object attributes: { } |
quote
Quote |
Quote character, default is a double quote (") if auto-detection is disabled type: maxLength: 1 |
quoteEscape
Quote escape |
Quote escape character, default is a double quote (") if auto-detection is disabled type: maxLength: 1 |
skipEmptyLines
Skip empty lines |
Skip any empty lines encountered, default true type: default value: ' |
trimWhitespace
Trim whitespace |
Trim off leading and trailing whitespace from columns, default true type: default value: ' |
type
required |
type: default value: ' enum: { csv } |