Integrate Fusion with an Existing Solr Deployment

If you have already implemented Solr as a standalone instance or as a SolrCloud cluster, you can add Fusion to your existing Solr deployment and import your Solr collections into Fusion. Each Fusion collection can import one Solr collection.

  • If your existing Solr instance is running in SolrCloud mode, you can use Fusion’s UI to modify configuration files (such as schema.xml or solrconfig.xml) and create Solr collections.

  • If your existing Solr instance is running in standalone mode, you can still connect it to Fusion. Fusion can send documents to a standalone Solr instance and query the instance. But you won’t be able to use Fusion’s UI to create Solr collections (Solr cores) or to modify Solr configuration files.

Prerequisites

  • You have already installed Fusion.

  • You have already installed Solr, which must meet these Solr requirements.

  • You have already installed ZooKeeper, which must meet these ZooKeeper requirements.

    Note
    We recommend that you create an external ZooKeeper cluster (external to both Fusion and SolrCloud).
  • Your Solr deployment must contain one or more collections (cores).

  • In SolrCloud mode, Solr must be configured to use ZooKeeper.

Configure Fusion to use an existing Solr deployment

Use the Fusion UI or the Fusion API to integrate Fusion with an existing Solr deployment.

Use the Fusion UI

  1. Create a Fusion search cluster:

    1. In the Fusion UI, navigate to System > Solr Clusters and click New Solr Cluster.

    2. Enter this information:

      • A cluster ID of your choice

      • Whether SolrCloud is enabled

      • The connect string (to tell Fusion how to connect to the SolrCloud cluster or Solr instance)

        • For SolrCloud, this is the ZooKeeper connect string.

        • For a standalone Solr instance, this is the URL of the Solr instance.

    3. Verify that the connection is working by clicking Cores in the new cluster and inspecting the contents.

  2. Create a Fusion collection that points to your Solr cluster and collection:

    1. In the UI, navigate to Collections and click Add a Collection.

    2. Enter a name for the new collection.

    3. Click Advanced.

    4. Select your SolrCloud cluster or Solr instance from the dropdown.

    5. Enter the name of the Solr collection to import.

Use the Fusion API

Use the Search Cluster API to create a Solr cluster.

Then use the Collections API to create and configure a collection.

Sending Documents to Solr through Fusion

You can use the Fusion connectors to crawl documents and index them to your existing Solr deployment.

  1. Follow the steps above to create and configure a search cluster and a collection that points to Solr.

  2. Define an index pipeline that ends with a Solr Indexer stage that sends the documents to Solr.

  3. Use one of these methods to ingest your data:

    • In the collection that points to your Solr collection, define a datasource using the connector of choice.

    • Send prepared documents directly to the index pipeline for processing. See Pushing Documents to a Pipeline.

    • It’s also possible to use a different indexing process besides a connector, such as a script that sends documents through the index pipeline.

When documents are sent to Solr, a buffering solrServer is used. Buffering the updates reduces the number of HTTP requests made from Fusion to Solr, which can significantly affect processing time. For example, when processing simple documents, you should always try to buffer as many documents as possible to increase throughput. When processing complex documents, you should use small batch sizes. You should only turn buffering off if you are using an older version of Solr and you want Fusion to catch and document indexing errors.

Querying Solr via Fusion requests

Indexed documents are stored in Solr indexes. You can query for these documents by using query pipelines. The query pipelines let you define your query parameters – such as how many records to return, the fields you’d like, how to structure facets, and so on. You also have the ability to add JavaScript to the response processing, and define landing pages or specific boost levels depending on the user’s query. See Query Pipelines.

If you prefer, you can also use the Solr API and SolrAdmin API to query Solr directly.