Solr Connector and Datasource Configuration

A Solr connector pulls documents from an external standalone Solr instance or SolrCloud cluster using Solr’s javabin response type and streaming response parser.

For Solr v4.7 and greater, cursorMark deep-paging is used. For earlier versions of Solr, standard paging (start+rows) is used.

The following Solr components and parameters can be configured:

  • collection/core (also allows default/empty core)

  • query (*:* by default)

  • filter queries

  • query parser

  • request handler (defaults to /select)

  • stored fields to retrieve

Also, since cursorMark deep paging should be used when possible:

  • sort spec (default: id asc)

This connector can be configured to store information about datasources and the data ingested in a ConnectorDB crawldb instance.

Limitations

  1. Cannot do incremental crawls. (May be possible to do so in the future using source Solr docs' version field.)

  2. Cannot do manual filtered deep paging.

  3. Doesn’t check that both sort spec and field list contain uniqueKey field.

  4. Cannot handle encrypted connection to Solr

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.