CSV Parsing Index Stage

The CSV Parsing Index stage (previously called the CSV Parser stage) parses CSV content from a document field into new documents. It will produce as many documents are there are rows in the CSV input, excluding comment and header rows.

The following CSV snippet consists of a row of column headers and 5 rows of data, (6 lines total):

Name,Description
Arnold_Schwarzenegger,"Austrian-American bodybuilder, actor, politician"
Anthony_Hopkins,"Actor"
Albert_Brooks,"Actor, voice actor, writer, comedian and director"
Britney_Spears,"American musician, singer, songwriter, actress, author"
Brigitte_Bardot,"French actress and animal welfare activist"

Running this input through a CSV parsing stage which has been configured to use the column headers as field names produces 5 documents:

(document) field "Name" field "Description"

(1)

Arnold_Schwarzenegger

Austrian-American bodybuilder, actor, politician

(2)

Anthony_Hopkins

Actor

(3)

Albert_Brooks

Actor, voice actor, writer, comedian and director

(4)

Britney_Spears

American musician, singer, songwriter, actress, author

(5)

Brigitte_Bardot

French actress and animal welfare activist

There are two configuration properties for proper handling of the CSV header columns:

  • headers configuration property can be used to define the mapping from CSV columns to document fields using a list of field names which are applied to the columns in the order in which they are specified.

  • headersHandling configuration property specifies how to handle the first row of CSV data. It takes one of three possible values:

    • parse - Assumes the input document contains headers in the first row. The column headers are used as the field name. This is the default option.

    • dynamic - Assumes the input does not contain headers. Treats the first line of the CSV file as data. If field names have been specified they will be used, else dynamic field names will be created for the parsed values.

    • ignore - Assumes the input document contains headers and ignores them. If field names have been specified they will be used, else dynamic field names will be created for the parsed values.

Configuration