A collection includes one or more datasources. A datasource is a configuration that manages the import and indexing of data into the collection. The Index Workbench provides a development environment for creating, configuring, and testing a datasource configuration. Every datasource configuration includes the following:
  • Connector configuration, specifying the source and format of the incoming data.
  • Parser configuration, describing a series of conditional parsing stages to transform the incoming data into PipelineDocument objects.
  • Index pipeline configuration, consisting of stages that transform PipelineDocument objects into Solr documents to be indexed.
ingest Collections and datasources can also be managed through the REST API.