Solr Index Connector
A Solr connector pulls documents from an external standalone Solr instance or SolrCloud cluster using Solr’s javabin response type and streaming response parser.
For Solr v4.7 and greater, cursorMark deep-paging is used. For earlier versions of Solr, standard paging (start+rows) is used.
The following Solr components and parameters can be configured:
-
collection/core (also allows default/empty core)
-
query (*:* by default)
-
filter queries
-
query parser
-
request handler (defaults to /select)
-
stored fields to retrieve
Also, since cursorMark deep paging should be used when possible:
-
sort spec (default: id asc)
This connector can be configured to store information about datasources and the data ingested in a ConnectorDB crawldb instance.
Limitations
-
Cannot do incremental crawls. (May be possible to do so in the future using source Solr docs' version field.)
-
Cannot do manual filtered deep paging.
-
does not check that both sort spec and field list contain uniqueKey field.
-
Cannot handle encrypted connection to Solr.