Upgrading Fusion

Once you have a Fusion-based search application up and running, at some point it may be necessary to upgrade to a later version of Fusion. Your goal is to transfer over all of your data together with all configurations and customizations necessary to support your applications. In this section we discuss the general procedure for upgrading. We provide upgrade instructions for specific versions of Fusion, see per-version instruction sets.

See the release history to find out what’s new, including which versions of Solr, Spark, and ZooKeeper are bundled with each Fusion release.

The upgrade process leaves the current Fusion deployment in place while a new Fusion deployment is installed and configured. All of the upgrade operations copy information from the current Fusion over to the new Fusion. This provides a rollback option should the upgrade procedure encounter problems.

The current Fusion configurations must remain as-is during the upgrade process. In order to capture indexing job history, no indexing jobs should be running. If the new Fusion installation is being installed onto the same server that the current Fusion installation is running on, you must either run only one version at a time or else change the Fusion component server ports so that all components are using unique ports for both the current and new versions.

Per-version instruction sets

To upgrade to a later version of Fusion from an existing installation requires transferring over all configurations and data from your existing Fusion installation to the new version. This section contains the following instruction sets:

Upgrade roadmap: migrating data, configurations, and customizations

ZooKeeper

Migration consists of the following steps:

  • Copy the ZooKeeper data nodes which contain Fusion configuration information from the FUSION-CURRENT ZooKeeper instance to the FUSION-NEW ZooKeeper instance

  • Rewrite Fusion datasource and pipeline configurations, working against the FUSION-NEW ZooKeeper instance

Important
Because some Fusion configurations have changed, ZooKeeper data must be rewritten accordingly using scripts available in the public GitHub repository: https://github.com/LucidWorks/fusion-upgrade-scripts.

Solr

Fusion-based search applications store your data in Solr. If your data is stored in an external Solr cluster, and if you aren’t upgrading your Solr cluster, then you don’t need to migrate your Solr data at all, you just need to configure the new Fusion deployment to use this external Solr cluster.

Fusion uses Solr as a data store for server logs, search logs, as well as for binary components such as jar files and compiled models used by Fusion pipelines. The Fusion distribution includes a complete Solr server and is configured to use this embedded Solr instance by default. If the current Fusion deployment is using the embedded Solr for its system collections, then all of these collections must be copied over to the embedded Solr instance included with the new Fusion distribution.

Connector services data: crawldb, database drivers

The directory fusion/3.0.x/data/connectors/lucid.jdbc contains third party JDBC driver files that have been registered with Fusion in order to run the JDBC connector.

The directory fusion/3.0.x/data/connectors/crawldb is managed by Fusion’s connector service. Fusion datasources that walk over websites, filesystems, or similar repository use the crawldb to store information about files visited during the crawl; this allows incremental updates and avoids data re-indexing. In current versions of Fusion, the default location is the Fusion directory fusion/3.0.x/data/connectors/crawldb.

Important
The crawldb data format was changed, therefore for upgrades from Fusion 1.2.x to the latest Fusion 2.1, these must be processed using a reformatting program that is available in the public upgrade scripts repository: https://github.com/LucidWorks/fusion-upgrade-scripts.

Pipeline services data: models used by pipeline stages

The directory fusion/3.0.x/data/connectors/lucid.jdbc contains third party JDBC driver files that have been registered with Fusion in order to run the JDBC connector.

Customized settings for Fusion run commands and configuration scripts

The scripts used to start, stop, and restart Fusion and its components are found in the top-level "bin" directory of the Fusion distribution. As of Fusion 2.0, the configuration scripts previously found in the "bin" directory were put in their own top-level directory called "conf".

Important
If you have customized the settings in these files and wish to carry these settings over to the new Fusion deployment the only way to do so is to edit the command and configuration scripts in the new Fusion deployment by hand. You cannot copy over the old configuration files because they may contain commands or settings which are no longer valid.

Custom Fusion connector plugins and pipeline stages

Custom Java components written for one version of Fusion must be re-compiled and and installed anew for the latest version of Fusion as Fusion’s Java API may have changed.

Incompatibilities between Fusion 2.4 and version 2.1

Datasource Configuration

All datasource definitions are stored inside of ZooKeeper, and in Fusion 2.4 the structure of these definitions in ZooKeeper has changed; the FUSION-UPGRADE-SCRIPTS repository contains a script which can rewrite these definitions.

Deprecated Connectors

The following connectors have been deprecated:

  • DropBox

  • Logstash Connector

  • S3H Connector

  • Slack Connector

  • Twitter Search

  • Twitter Stream

JavaScript Index pipeline stages

JavaScript Index pipeline stage now requires that the function argument parameters list contain the names of any variable used in the function body, excepting the global 'logger'. This includes references to:+

  • object PipelineDocument 'doc'

  • object PipelineContext 'ctx'

  • string collection name 'collection'

  • object BufferingSolrServer 'solrServer'

  • object SolrClientFactory 'solrServerFactory'

Prior to 2.4, functions which referenced these objects worked even if they weren’t included in the function argument parameters list. Here are two examples of JavaScript functions which will break in Fusion 2.4 but which work in prior versions:

function returns object 'doc', empty parameter list

function () {
    doc.addField('example-field-100','some value');
    return doc;
}

function parameter list specifies 'doc' but not 'solrServerFactory'

function (doc) {
    var imports = new JavaImporter(
        org.apache.solr.client.solrj.SolrQuery,
        org.apache.solr.client.solrj.util.ClientUtils);
    with(imports) {
        var sku = doc.getFirstFieldValue("sku");
        if (!doc.hasField("mentions")) {
            var mentions = ""
            var productsSolr = solrServerFactory.getSolrServer("products");
            if( productsSolr != null ){
                var q = "sku:"+sku;
                var query = new SolrQuery();
                query.setRows(100);
                query.setQuery(q);
                var res = contactsClient.query(query);
                mentions = res.getResults().size();
                doc.addField("mentions",mentions);
            }
        }
    }
    return doc;
}

Configuration file location changes

Previously, log configuration files were located in subdirectories of their Fusion components. As of version 2.4, the Fusion distribution top-level directory 'conf' contains all log configuration files.

Incompatibilities between Fusion 2.1 and 1.2 releases

Fusion configuration properties

Several configuration properties for Fusion connectors and pipeline stages have changed. These configurations are stored in Fusion’s ZooKeeper. The FUSION-UPGRADE-SCRIPTS repository contains a script which can rewrite these definitions.

Crawldb format changes

The directory "FUSION_HOME/data/connectors/crawldb is managed by Fusion’s connector service. Fusion datasources that walk over websites, filesystems, or similar repository use the crawldb to store information about files visited during the crawl; this allows incremental updates and avoids data re-indexing.

The format of the crawldb was changed in Fusion 2.1. You must run the conversion utility from the FUSION-UPGRADE-SCRIPTS repository: com.lucidworks.fusion-crawldb-migrator-0.1.0.jar to preserve crawldb information.

Fusion distribution directory locations

The organization of Fusion distribution home directory changed in Fusion 2. The following diagram summarizes the essential changes:

Fusion home directory organization