Upgrade Fusion 2.1 or 2.2 to Fusion 2.4
- Unpack FUSION-NEW
- Customize FUSION-NEW configuration files and run scripts
- Copy local data stores in directory FUSION-CURRENT/data
- Migrate Fusion configurations between ZooKeeper instances
- Troubleshooting the upgrade
This instruction set is valid for all Fusion 2.1 releases as well as Fusion 2.2.0.
Fusion 2.4 introduces changes to the configuration properties for some Fusion datasources. To update these configurations, we have provided a program which can be downloaded from: https://github.com/LucidWorks/fusion-upgrade-scripts.
Once you have migrated all Fusion configurations from the current Fusion 2.1 ZooKeeper service to the new Fusion 2.4 ZooKeeper service, you must run this script against the new ZooKeeper service, see Migrate ZooKeeper data
The upgrade process leaves the current Fusion deployment in place while a new Fusion deployment is installed and configured. All of the upgrade operations copy information from the current Fusion over to the new Fusion. This provides a rollback option should the upgrade procedure encounter problems.
The current Fusion configurations must remain as-is during the upgrade process. In order to capture indexing job history, no indexing jobs should be running. If the new Fusion installation is being installed onto the same server that the current Fusion installation is running on, you must either run only one version at a time or else change the Fusion component server ports so that all components are using unique ports for both the current and new versions.
These instructions use the following names to refer to the directories involved in the upgrade procedure:
FUSION_HOME: Absolute pathname to the top-level directory of the Fusion distribution
FUSION-CURRENT: Name of the FUSION_HOME directory for the current Fusion version, e.g. "/opt/lucidworks/fusion-2.1.2"
FUSION-NEW: Name of the directory of the upgrade Fusion distribution during the upgrade process, e.g. "/opt/lucidworks/fusion-2.4.1"
INSTALL-DIR: Directory where the new Fusion version will be installed, e.g. "opt/lucidworks" All scripts and commands in the upgrade instruction set are carried out from this directory.
FUSION-UPGRADE-SCRIPTS: Full path to the directory that contains the upgrade scripts from https://github.com/LucidWorks/fusion-upgrade-scripts.
File-system permissions: the user running the upgrade scripts and commands must have read/write/execute (rwx) permissions on directory INSTALL-DIR.
Download but do not unpack a copy of the FUSION-NEW distribution. The compressed Fusion distribution requires approximately 1.7 GB disk space. All supported version are available from Lucidworks Fusion Get Started page.
Disk space requirements: the INSTALL-DIR must be on a disk partition which has enough free space for the complete FUSION-NEW installation, that is, there must be at least as much free space as the size of the FUSION-CURRENT directory. On a Unix system, the following commands can be used:
du -sh fusion- total size of FUSION-CURRENT.
df -kH- amount of free space on all file-systems.
Download a copy of the Fusion upgrade scripts from the GitHub repository https://github.com/LucidWorks/fusion-upgrade-scripts. These upgrade scripts run under Python 2.7. They have been tested with version 2.7.10. If this version of Python isn’t available, you should use Python’s virtualenv. If you don’t have permissions to install packages, you can use python to install
virtualenvand then from your
virtualenvpython environment, you can install your own versions of theses packages.
These scripts require the environment variable
Upgrades from 2.1 to 2.4 use the script
src/upgrade-ds-2.1-to-2.4.py. This script requires the python package kazoo which is a ZooKeeper client.
Current working directory must be INSTALL-DIR
The commands in this section assume that your current working directory is INSTALL-DIR (e.g., "opt/lucidworks"), therefore
cdto this directory before continuing.
Avoid directory name conflicts between FUSION-CURRENT and FUSION-NEW
By default, the Fusion distribution unpacks into a directory named "fusion". If the INSTALL-DIR is the directory which contains the FUSION-CURRENT directory and if the FUSION-CURRENT directory is named "fusion", then you must create a new directory with a different name into which to unpack the Fusion distribution. For example, if your INSTALL-DIR is "/opt/lucidworks" and your FUSION-CURRENT directory is "/opt/lucidworks/fusion", then you should create a directory directory named "fusion-new" and unpack the contents of the distribution here:
> mkdir fusion-new > tar -C fusion-new --strip-components=1 -xf fusion-2.4.1.tar.gz
If you are working on a Windows machine, the zipfile unzips into a folder named "fusion-2.4.1" which contains a folder named "fusion". Rename folder "fusion" to "fusion-new" and move it into folder INSTALL-DIR.
Customize FUSION-NEW configuration files and run scripts
The Fusion run scripts in the
FUSION_HOME/bin directory start and stop Fusion and its component services.
The Fusion configuration files
FUSION_HOME/conf define environment variables used by the Fusion run scripts.
The configuration and run scripts for the FUSION-NEW installation must be edited by hand,
you cannot copy over existing scripts from the current installation.
The Fusion configuration scripts might need to be updated if you have changed default settings. These scripts will need to be updated for deployments that:
Use an external ZooKeeper cluster as Fusion’s ZooKeeper service
Use an external Solr cluster to manage Fusion’s system collections
Run on non-standard ports
Have been configured to run over SSL
To facilitate the task of identifying changes made to the current installation,
the FUSION-UPGRADE-SCRIPTS repository contains a directory "reference-files" which
contains copies of the contents of these directories for all Fusion releases.
To identify changes, use the Unix
diff command with the
-r flag; e.g., if FUSION-CURRENT is 2.1.1,
then these diff commands will report the set of changed files and the changes that were made:
> diff -r FUSION-CURRENT/bin FUSION-UPGRADE-SCRIPTS/reference-files/bin-2.1.1 > diff -r FUSION-CURRENT/conf FUSION-UPGRADE-SCRIPTS/reference-files/conf-2.1.1
A copy of Fusion is installed on every node in a Fusion deployment. Depending on the role that node plays in the deployment, the configuration settings and run scripts are customized accordingly. Therefore, if you are running a multi-node Fusion deployment this configuration step will be carried out for each node in the cluster.
Copy local data stores in directory
FUSION_HOME/data contains the on-disk data stores
managed directly or indirectly by Fusion services.
FUSION_HOME/data/connectorscontains data required by Fusion connectors.
FUSION_HOME/data/connectors/lucid.jdbccontains third-party JDBC driver files. If your application uses a JDBC connector, you must copy this information over to every server on which will this connector will run.
FUSION_HOME/data/connectors/crawldbcontains information on the filed visited during a crawl. (Preserving crawldb history may not be possible if there are multiple different servers running Fusion connectors services.)
FUSION_HOME/data/nlpcontains data used by Fusion NLP pipeline stages. If you are using Fusion’s NLP components for sentence detection, part-of-speech tagging, and named entity detection, you must copy over the model files stored under this directory.
FUSION_HOME/data/solrcontains the backing store for Fusion’s embedded Solr (developer deployment only).
FUSION_HOME/data/zookeepercontains the backing store for Fusion’s embedded ZooKeeper (developer deployment only).
If FUSION_CURRENT and FUSION_NEW are installed on the same server, you can copy a subset of these directories using the Unix "cp" command, e.g.:
> cp -R FUSION-CURRENT/data/connectors/lucid.jdbc FUSION-NEW/data/connectors > cp -R FUSION-CURRENT/data/connectors/crawldb FUSION-NEW/data/connectors > cp -R FUSION-CURRENT/data/nlp FUSION-NEW/data/
If FUSION_CURRENT and FUSION_NEW are on different servers, use the Unix
Migrate ZooKeeper and Solr for single-node Fusion deployment
If you are running a single-node Fusion deployment and using both the embedded ZooKeeper and the embedded Solr that ships with this distribution, then you must copy over both the configurations and data.
To copy the ZooKeeper configuration:
> cp -R FUSION-CURRENT/data/zookeeper FUSION-NEW/data
To copy the Solr data:
> cp -R FUSION-CURRENT/data/solr FUSION-NEW/data
If the Solr collections are very large this may take a while.
Migrate Fusion configurations between ZooKeeper instances
Migration consists of the following steps:
Copy the ZooKeeper data nodes which contain Fusion configuration information from the FUSION-CURRENT ZooKeeper instance to the FUSION-NEW ZooKeeper instance
Fusion’s utility script zkImportExport.sh is used to copy ZooKeeper data between ZooKeeper clusters. This script is included with all Fusion distributions in the top-level directory named
Rewrite Fusion datasource configurations
Fusion 2.4 changed and standardized the configuration properties used by several datasources. The public GitHub repository https://github.com/LucidWorks/fusion-upgrade-scripts contains a python script
src/upgrade-ds-2.1-to-2.4.pywhich rewrites these properties.
Copying ZooKeeper data nodes
|This step is not necessary if you are doing an in-place upgrade of a single-node Fusion deployment; the copy command described in procedure single-node Fusion ZooKeeper data (above) is sufficient.|
Fusion configurations are stored in Fusion’s ZooKeeper instance under two top-level znodes:
lucidstores all application-specific configurations, including collection, datasource, pipeline, signals, aggregations, and associated scheduling, jobs, and metrics.
lucid-apollo-adminstores all access control information, including all users, groups, roles, and realms.
Fusion’s utility script zkImportExport.sh is used to migrate ZooKeeper data between ZooKeeper clusters. Migrating configuration information from one deployment to another requires running this script twice:
The first invocation runs the script in "export" mode, in order to get the set of configurations to be migrated as a JSON dump file.
The second invocation runs the script in "import" or "update" mode, in order to sent this configuration set to the other Fusion deployment.
When running this script against a Fusion deployment, it is advisable to stop all Fusion services except for Fusion’s ZooKeeper service.
Exporting Fusion configurations from FUSION-CURRENT ZooKeeper Service
The ZooKeeper service for FUSION-CURRENT must be running. Either stop all other Fusion services
or otherwise ensure that no changes to Fusion configurations take place during this procedure.
If you are upgrading from a Fusion 1.2 installation which uses Fusion’s embedded Solr service and
the ZooKeeper service included with that Solr installation, then starting just the Solr service
will start the ZooKeeper service as well.
If you are upgrading from a Fusion 2 installation, you can start just the ZooKeeper service
via the script "zookeeper" in the
The zkImportExport.sh script arguments are:
-cmd export- This is the command parameter which specifies the mode in which to run this program.
-zkhost <FUSION_CURRENT ZK>- The ZooKeeper connect string is the list of all servers,ports for the FUSION_CURRENT ZooKeeper cluster. For example, if running a single-node Fusion developer deployment with embedded ZooKeeper, the connect string is
localhost:9983. If you have an external 3-node ZooKeeper cluster running on servers "zk1.acme.com", "zk2.acme.com", "zk3.acme.com", all listening on port 2181, then the connect string is
-filename <path/to/JSON/dump/file>- The name of the JSON dump file to save to.
-path <start znode>
To migrate all ZooKeeper data, the path is "/".
To migrate only the Fusion services configurations, the path is "/lucid". Migrating just the "lucid" node between the ZooKeeper services used by different Fusion deployments results in deployments which contain the same applications but not the same user databases.
To migrate the Fusion users, groups, roles, and realms information, the path is "/lucid-apollo-admin".
Example of exporting Fusion configurations for znode "/lucid" from a local single-node ZooKeeper service:
> $FUSION_HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd export -path /lucid -filename znode_lucid_dump.json
Importing ZooKeeper data into FUSION-NEW
ZooKeeper service for FUSION-NEW must be running.
To import configurations, run the zkImportExport.sh script, this time with arguments:
command; must be
ZooKeeper connect string for the FUSION-NEW Zookeeper cluster
Location of JSON dump file.
This command will fail if the "lucid" znode in this Fusion installation contains configuration definitions which are in conflict with the exported data.
Example of importing exported data from previous step into FUSION_NEW ZooKeeper running on test server 'test.acme.com':
> $FUSION_HOME/scripts/zkImportExport.sh -zkhost test.acme.com:9983 -cmd import -filename znode_lucid_dump.json
Note that the above command will fail if there is conflict between existing znode structures or contents between the ZooKeeper service and the dump file.
Rewrite datasource configurations for Fusion 2.4
Once all Fusion configurations have been uploaded to the FUSION-NEW ZooKeeper service and while that service is running, you can run the Python programs upgrade-ds-2.1-to-2.4.py or upgrade-ds-1.2-to-2.4.py to update these configurations.
These programs require:
The Python virtualenv tool can be used to install the correct Python version and required package.
Set environment variable "FUSION_HOME" to the full path of the FUSION-NEW directory, e.g.:
> export FUSION_HOME=/Users/demo/test_upgrade/fusion_2_4_1
Run this program with arguments: "--datasources all"
If your current Fusion version is 1.2, run:
> python upgrade-ds-1.2-to-2.4.py --datasources all
If your current Fusion is version 2, run:
> python upgrade-ds-2.1-to-2.4.py --datasources all
If a datasource wouldn’t have a valid implementation, the application will print a log message on console and continue with the next datasource.
Troubleshooting the upgrade
Clear your browser cache after starting the UI in the new Fusion instance
The Fusion 2.4 Index Pipeline Simulator can be used to verify that the existing set of datasource configurations work as expected.