Upgrade Fusion 2.1 or 2.2 to Fusion 2.4

This instruction set is valid for all Fusion 2.1 releases as well as Fusion 2.2.0.


Fusion 2.4 introduces changes to the configuration properties for some Fusion datasources. To update these configurations, we have provided a program which can be downloaded from: https://github.com/LucidWorks/fusion-upgrade-scripts.

Once you have migrated all Fusion configurations from the current Fusion 2.1 ZooKeeper service to the new Fusion 2.4 ZooKeeper service, you must run this script against the new ZooKeeper service, see Migrate ZooKeeper data

The upgrade process leaves the current Fusion deployment in place while a new Fusion deployment is installed and configured. All of the upgrade operations copy information from the current Fusion over to the new Fusion. This provides a rollback option should the upgrade procedure encounter problems.

The current Fusion configurations must remain as-is during the upgrade process. In order to capture indexing job history, no indexing jobs should be running. If the new Fusion installation is being installed onto the same server that the current Fusion installation is running on, you must either run only one version at a time or else change the Fusion component server ports so that all components are using unique ports for both the current and new versions.


These instructions use the following names to refer to the directories involved in the upgrade procedure:

  • FUSION_HOME: the absolute pathname to the top-level directory of the Fusion distribution.

  • FUSION-CURRENT: name of the FUSION_HOME directory for the current Fusion version, e.g. "/opt/lucidworks/fusion-2.1.2".

  • FUSION-NEW: name of the directory of the upgrade Fusion distribution during the upgrade process, e.g. "/opt/lucidworks/fusion-2.4.1".

  • INSTALL-DIR: the directory where the new Fusion version will be installed, e.g. "opt/lucidworks". All scripts and commands in the upgrade instruction set are carried out from this directory.

  • FUSION-UPGRADE-SCRIPTS: the full path to the directory which contains the upgrade scripts from https://github.com/LucidWorks/fusion-upgrade-scripts.


  • File-system permissions: the user running the upgrade scripts and commands must have read/write/execute (rwx) permissions on directory INSTALL-DIR.

  • Download but do not unpack a copy of the FUSION-NEW distribution. The compressed Fusion distribution requires approximately 1.7 GB disk space. All supported version are available from Lucidworks Fusion Get Started page.

  • Disk space requirements: the INSTALL-DIR must be on a disk partition which has enough free space for the complete FUSION-NEW installation, that is, there must be at least as much free space as the size of the FUSION-CURRENT directory. On a *nix system, the following commands can be used:

    • du -sh fusion - total size of FUSION-CURRENT.

    • df -kH - amount of free space on all file-systems.

  • Download a copy of the Fusion upgrade scripts from GitHub repository https://github.com/LucidWorks/fusion-upgrade-scripts. These upgrade scripts run under Python 2.7. They have been tested with version 2.7.10. If this version of Python isn’t available, you should use Python’s virtualenv. If you don’t have permissions to install packages, you can use python to install virtualenv and then from your virtualenv python environment, you can install your own versions of theses packages.

These scripts require environment variable FUSION_OLD_HOME which should be set to the location of the current Fusion installation, i.e., the existing 1.2 or 2.1 install.
  • Upgrades from 2.1 to 2.4 use script src/upgrade-ds-2.1-to-2.4.py. This script requires python package kazoo which is a ZooKeeper client.

  • Upgrades from 1.2 to 2.4 use two scripts: src/upgrade-ds-1.2-to-2.4.py and bin/download_upload_ds.py. These scripts require python packages kazoo and requests which is an HTTP request handler.



  • Current working directory must be INSTALL-DIR
    The commands in this section assume that your current working directory is INSTALL-DIR (e.g., "opt/lucidworks"), therefore cd to this directory before continuing.

  • Avoid directory name conflicts between FUSION-CURRENT and FUSION-NEW
    By default, the Fusion distribution unpacks into a directory named "fusion". If the INSTALL-DIR is the directory which contains the FUSION-CURRENT directory and if the FUSION-CURRENT directory is named "fusion", then you must create a new directory with a different name into which to unpack the Fusion distribution. For example, if your INSTALL-DIR is "/opt/lucidworks" and your FUSION-CURRENT directory is "/opt/lucidworks/fusion", then you should create a directory directory named "fusion-new" and unpack the contents of the distribution here:

> mkdir fusion-new
> tar -C fusion-new --strip-components=1 -xf fusion-2.4.1.tar.gz

If you are working on a Windows machine, the zipfile unzips into a folder named "fusion-2.4.1" which contains a folder named "fusion". Rename folder "fusion" to "fusion-new" and move it into folder INSTALL-DIR.

Customize FUSION-NEW configuration files and run scripts

The Fusion run scripts in the FUSION_HOME/bin directory start and stop Fusion and its component services. The Fusion configuration files FUSION_HOME/conf define environment variables used by the Fusion run scripts. The configuration and run scripts for the FUSION-NEW installation must be edited by hand, you cannot copy over existing scripts from the current installation.

The Fusion configuration scripts may need to be updated if you have changed default settings. These scripts will need to be updated for deployments which:

  • use an external ZooKeeper cluster as Fusion’s ZooKeeper service

  • use an external Solr cluster to manage Fusion’s system collections

  • run on non-standard ports

  • have been configured to run over SSL

To facilitate the task of identifying changes made to the current installation, the FUSION-UPGRADE-SCRIPTS repository contains a directory "reference-files" which contains copies of the contents of these directories for all Fusion releases. To identify changes, use the *nix diff command with the -r flag, e.g. if FUSION-CURRENT is 2.1.1, then the diff commands:

> diff -r FUSION-CURRENT/bin FUSION-UPGRADE-SCRIPTS/reference-files/bin-2.1.1
> diff -r FUSION-CURRENT/conf FUSION-UPGRADE-SCRIPTS/reference-files/conf-2.1.1

will report the set of files changes and the changes that were made.

A copy of Fusion is installed on every node in a Fusion deployment. Depending on the role that node plays in the deployment, the configuration settings and run scripts are customized accordingly. Therefore, if you are running a multi-node Fusion deployment this configuration step will be carried out for each node in the cluster.

Copy local data stores in directory FUSION-CURRENT/data

The directory FUSION_HOME/data contains the on-disk data stores managed directly or indirectly by Fusion services.

  • FUSION_HOME/data/connectors contains data required by Fusion connectors.

    • FUSION_HOME/data/connectors/lucid.jdbc contains third-party JDBC driver files. If your application uses a JDBC connector, you must copy this information over to every server on which will this connector will run.

    • FUSION_HOME/data/connectors/crawldb contains information on the filed visited during a crawl. (Preserving crawldb history may not be possible if there are multiple different servers running Fusion connectors services.)

  • FUSION_HOME/data/nlp contains data used by Fusion NLP pipeline stages. If you are using Fusion’s NLP components for sentence detection, part-of-speech tagging, and named entity detection, you must copy over the model files stored under this directory.

  • FUSION_HOME/data/solr contains the backing store for Fusion’s embedded Solr (developer deployment only).

  • FUSION_HOME/data/zookeeper contains the backing store for Fusion’s embedded ZooKeeper (developer deployment only).

If FUSION_CURRENT and FUSION_NEW are installed on the same server, you can copy a subset of these directories using the Unix "cp" command, e.g.:

> cp -R FUSION-CURRENT/data/connectors/lucid.jdbc FUSION-NEW/data/connectors
> cp -R FUSION-CURRENT/data/connectors/crawldb FUSION-NEW/data/connectors
> cp -R FUSION-CURRENT/data/nlp FUSION-NEW/data/

If FUSION_CURRENT and FUSION_NEW are on different servers, use the Unix rsync utility.

Migrate ZooKeeper and Solr for single-node Fusion deployment

If you are running a single-node Fusion deployment and using both the embedded ZooKeeper and the embedded Solr that ships with this distribution, then you must copy over both the configurations and data.

To copy the ZooKeeper configuration:

> cp -R FUSION-CURRENT/data/zookeeper FUSION-NEW/data

To copy the Solr data:

> cp -R FUSION-CURRENT/data/solr FUSION-NEW/data

If the Solr collections are very large this may take a while.

Migrate Fusion configurations between ZooKeeper instances

Migration consists of the following steps:

  • Copy the ZooKeeper data nodes which contain Fusion configuration information from the FUSION-CURRENT ZooKeeper instance to the FUSION-NEW ZooKeeper instance

    Fusion’s utility script zkImportExport.sh is used to copy ZooKeeper data between ZooKeeper clusters. This script is included with all Fusion distributions in the top-level directory named scripts.

  • Rewrite Fusion datasource configurations

    Fusion 2.4 changed and standardized the configuration properties used by several datasources. The public GitHub repository https://github.com/LucidWorks/fusion-upgrade-scripts contains a python script src/upgrade-ds-2.1-to-2.4.py which rewrites these properties.

Copying ZooKeeper data nodes

This step is not necessary if you are doing an in-place upgrade of a single-node Fusion deployment; the copy command described in procedure single-node Fusion ZooKeeper data (above) is sufficient.

Fusion configurations are stored in Fusion’s ZooKeeper instance under two top-level znodes:

  • node lucid stores all application-specific configurations, including collection, datasource, pipeline, signals, aggregations, and associated scheduling, jobs, and metrics.

  • node lucid-apollo-admin stores all access control information, including all users, groups, roles, and realms.

Fusion’s utility script zkImportExport.sh is used to migrate ZooKeeper data between ZooKeeper clusters. Migrating configuration information from one deployment to another requires running this script twice:

  • The first invocation runs the script in "export" mode, in order to get the set of configurations to be migrated as a JSON dump file.

  • The second invocation runs the script in "import" or "update" mode, in order to sent this configuration set to the other Fusion deployment.

When running this script against a Fusion deployment, it is advisable to stop all Fusion services except for Fusion’s ZooKeeper service.

Exporting Fusion configurations from FUSION-CURRENT ZooKeeper Service

The ZooKeeper service for FUSION-CURRENT must be running. Either stop all other Fusion services or otherwise ensure that no changes to Fusion configurations take place during this procedure. If you are upgrading from a Fusion 1.2 installation which uses Fusion’s embedded Solr service and the ZooKeeper service included with that Solr installation, then starting just the Solr service will start the ZooKeeper service as well. If you are upgrading from a Fusion 2 installation, you can start just the ZooKeeper service via the script "zookeeper" in the $FUSION_HOME/bin directory.

The zkImportExport.sh script arguments are:

  • -cmd export - this is the command parameter which specifies the mode in which to run this program.

  • -zkhost <FUSION_CURRENT ZK> - the ZooKeeper connect string is the list of all servers,ports for the FUSION_CURRENT ZooKeeper cluster. For example, if running a single-node Fusion developer deployment with embedded ZooKeeper, the connect string is localhost:9983. If you have an external 3-node ZooKeeper cluster running on servers "zk1.acme.com", "zk2.acme.com", "zk3.acme.com", all listening on port 2181, then the connect string is zk1.acme.com:2181,zk2.acme.com:2181,zk3.acme.com:2181

  • -filename <path/to/JSON/dump/file> - the name of the JSON dump file to save to.

  • -path <start znode>

    • To migrate all ZooKeeper data, the path is "/".

    • To migrate only the Fusion services configurations, the path is "/lucid". Migrating just the "lucid" node between the ZooKeeper services used by different Fusion deployments results in deployments which contain the same applications but not the same user databases.

    • To migrate the Fusion users, groups, roles, and realms information, the path is "/lucid-apollo-admin".

Example of exporting Fusion configurations for znode "/lucid" from a local single-node ZooKeeper service:

> $FUSION_HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd export -path /lucid -filename znode_lucid_dump.json

Importing ZooKeeper data into FUSION-NEW

ZooKeeper service for FUSION-NEW must be running.

To import configurations, run the zkImportExport.sh script, this time with arguments:

  • command, must be import

  • ZooKeeper connect string for the FUSION-NEW Zookeeper cluster

  • location of JSON dump file.

This command will fail if the "lucid" znode in this Fusion install contains configuration definitions which are in conflict with the exported data.

Example of importing exported data from previous step into FUSION_NEW ZooKeeper running on test server 'test.acme.com':

> $FUSION_HOME/scripts/zkImportExport.sh -zkhost test.acme.com:9983 -cmd import -filename znode_lucid_dump.json

Note that the above command will fail if there is conflict between existing znode structures or contents between the ZooKeeper service and the dump file.

Rewrite datasource configurations for Fusion 2.4

Once all Fusion configurations have been uploaded to the FUSION-NEW ZooKeeper service and while that service is running, you can run the Python programs upgrade-ds-2.1-to-2.4.py or upgrade-ds-1.2-to-2.4.py to update these configurations.


These programs require:

  • The environment variable "FUSION_HOME" must be set to the FUSION-NEW directory.

  • The environment variable "FUSION_OLD_HOME" must be set to the FUSION-CURRENT directory.

  • Python version 2.7, preferably version 2.7.10.

  • Package: kazoo - a ZooKeeper client

The Python virtualenv tool can be used to install and the correct Python version and required package.

Set environment variable "FUSION_HOME" to the full path of the FUSION-NEW directory, e.g.:

> export FUSION_HOME=/Users/demo/test_upgrade/fusion_2_4_1

Run this program with arguments: "--datasources all"

If your current Fusion version is 1.2, run:

> python upgrade-ds-1.2-to-2.4.py --datasources all

If your current Fusion is version 2, run:

> python upgrade-ds-2.1-to-2.4.py --datasources all

If a datasource wouldn’t have a valid implementation, the application will print a log message on console and continue with the next datasource.

Troubleshooting the upgrade

  • Clear your browser cache after starting the UI in the new Fusion instance

  • The Fusion 2.4 Index Pipeline Simulator can be used to verify that the existing set of datasource configurations work as expected.