Migrating Fusion Configurations

As part of the application life-cycle, it will be necessary to migrate your application from a Fusion development deployment, to the testing/QA deployment, and then to the production deployment.

This topic covers how to migrate application configuration data between Fusion deployments. A separate topic covers how to migrate Fusion objects.

ZooKeeper data migration

Migration consists of the following steps:

  • Copy the ZooKeeper data nodes which contain Fusion configuration information from the FUSION-CURRENT ZooKeeper instance to the FUSION-NEW ZooKeeper instance

  • Rewrite Fusion datasource and pipeline configurations, working against the FUSION-NEW ZooKeeper instance

From ZooKeeper to JSON file

To export configurations from an existing Fusion install, the script zkImportExport.sh requires parameters:

  • -cmd export - this is the command parameter which specifies the mode in which to run this program.

  • -zkhost <connect string> - the ZooKeeper connect string is the list of all servers,ports for the FUSION_CURRENT ZooKeeper cluster. For example, if running a single-node Fusion developer deployment with embedded ZooKeeper, the connect string is localhost:9983. If you have an external 3-node ZooKeeper cluster running on servers "zk1.acme.com", "zk2.acme.com", "zk3.acme.com", all listening on port 2181, then the connect string is zk1.acme.com:2181,zk2.acme.com:2181,zk3.acme.com:2181

  • -filename <path/to/JSON/dump/file> - the name of the JSON dump file to save to.

  • -path <start znode>

    • To migrate Fusion configurations for all applications, the path is "/lucid". Migrating just the "lucid" node between the ZooKeeper services used by different Fusion deployments results in deployments which contain the same applications but not the same user databases.

    • To migrate the Fusion users, groups, roles, and realms information, the path is "/lucid-apollo-admin".

    • To migrate all ZooKeeper data, the path is "/".

Example: export from local developer deployment to file "znode_lucid_dump.json"

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd export -path /lucid -filename znode_lucid_dump.json

The command products the following terminal outputs:

2016-06-01T19:48:12,512 - INFO  [main:URLConfigurationSource@125] - URLs to be used as dynamic configuration source: [jar:file:/Users/demo/tmp5/fusion/apps/jetty/api/webapps/api/WEB-INF/lib/lucid-base-spark-2.2.0.jar!/config.properties]
2016-06-01T19:48:12,878 - INFO  [main:DynamicPropertyFactory@281] - DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration@5bf22f18
2016-06-01T19:48:12,961 - INFO  [main:CloseableRegistry@45] - Registering a new closeable: org.apache.curator.framework.imps.CuratorFrameworkImpl@32fe9d0a
2016-06-01T19:48:12,961 - INFO  [main:CuratorFrameworkImpl@234] - Starting
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:host.name=10.0.1.16
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_25
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-06-01T19:48:12,975 - INFO  [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre
2016-06-01T19:48:12,975 - INFO  [main:Environment@100] - Client environment:java.class.path=./fusion/scripts/..  ... ( rest of path omitted )
2016-06-01T19:48:12,976 - INFO  [main:Environment@100] - Client environment:java.library.path=/Users/demo/Library/Java/Extensions: ... ( rest of path omitted )
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/var/folders/jq/ms_hc8f9269f4h8k4b691d740000gp/T/
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:os.name=Mac OS X
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:os.arch=x86_64
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:os.version=10.10.5
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:user.name=demo
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:user.home=/Users/demo
2016-06-01T19:48:12,978 - INFO  [main:Environment@100] - Client environment:user.dir=/Users/demo/tmp5
2016-06-01T19:48:12,978 - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:9983 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@138fe6ec
2016-06-01T19:48:18,070 - INFO  [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@975] - Opening socket connection to server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983. Will not attempt to authenticate using SASL (unknown error)
2016-06-01T19:48:18,111 - INFO  [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@852] - Socket connection established to fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983, initiating session
2016-06-01T19:48:18,118 - INFO  [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@1235] - Session establishment complete on server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983, sessionid = 0x1550df6b0180017, negotiated timeout = 40000
2016-06-01T19:48:18,121 - INFO  [main-EventThread:ConnectionStateManager@228] - State change: CONNECTED
2016-06-01T19:48:18,367 - INFO  [main:ZKImportExportCli@198] - Data written to file '/Users/demo/tmp5/znode_lucid_dump.json'
2016-06-01T19:48:18,370 - INFO  [main:ZooKeeper@684] - Session: 0x1550df6b0180017 closed
2016-06-01T19:48:18,370 - INFO  [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down

The resulting JSON output file contains the znode hierarchy for znode "lucid", with ZooKeeper binary data:

{
  "request" : {
    "timestamp" : "2016-06-01T19:48:13.001-04:00",
    "params" : {
      "zkHost" : "localhost:9983",
      "path" : "/lucid",
      "encodeValues" : "base64",
      "recursive" : true,
      "ephemeral" : false
    }
  },
  "response" : {
    "path" : "/lucid",
    "children" : [ {
      "path" : "/lucid/conf-default",
      "children" : [ {
        "path" : "/lucid/conf-default/fusion.spark.driver.jar.exclusions",
        "data" : "LipvcmcuYXBhY2hlLnNwYXJrLiosLipvcmcuc3BhcmstcHJvamVjdC4qLC4qb3JnLmFwYWNoZS5oYWRvb3AuKiwuKnNwYXJrLWFzc2VtYmx5LiosLipzcGFyay1uZXR3b3JrLiosLipzcGFyay1leGFtcGxlcy4qLC4qXFwvaGFkb29wLS4qLC4qXFwvdGFjaHlvbi4qLC4qXFwvZGF0YW51Y2xldXMuKg=="
      }, {
 ...

The size and number of lines in this file will vary depending on the number, complexity, and job histories stored in ZooKeeper.

From JSON file to ZooKeeper - migration scenarios

The following examples show how to run this script in different situations.

When uploading configurations to Fusion, only the Fusion ZooKeeper service should be running.

New application, new Fusion deployment

When migrating data to a fresh install of Fusion, the exported configurations are uploaded using the script command argument -cmd import.

import command example:

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd import -path /lucid -filename znode_lucid_dump.json

This command will fail if the "lucid" znode in this Fusion install contains configuration definitions which are in conflict with the exported data.

To verify, start all Fusion services and login to the new Fusion installation. As this is the initial install, the Fusion UI will display the "set admin password" panel. Once you have set the admin password, verify that this install contains the same set of collections and datasources as the existing collection.

New application, existing Fusion deployment

When migrating a new application to a Fusion deployment which is already configured with other applications, the exported configurations should be uploaded using the script command argument -cmd update.

update command example:

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd update -path /lucid -filename znode_lucid_dump.json

To verify, start all Fusion services and login to the new Fusion installation and verify that this install contains the same set of collections and datasources as the existing collection, and that all Fusion pipelines and stages match those of the existing Fusion installation.

Existing application, existing Fusion deployment

When migrating an existing application to a Fusion deployment which is already running a version of that application, the exported configurations should be uploaded using the script command argument -cmd update --overwrite.

update --overwrite command example:

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd update --override -path /lucid -filename znode_lucid_dump.json

To verify, start all Fusion services and login to the new Fusion installation and verify that this install contains the same set of collections and datasources as the existing collection, and that all Fusion pipelines and stages match those of the existing Fusion installation.

Caveats

  • All datasource configurations are copied over as is. If the set of repositories used to populate the collections changes according to deployment environment, then these datasources will need to be updated accordingly.

  • The import export script is only guaranteed to work between Fusion deployments running the same Fusion version. The should work across all releases for the same Major.minor version of Fusion, e.g. you should be able to migrate between versions 2.4.1 and 2.4.2. If the set of configurations needed for an application have the same structure and properties across two different versions, these scripts might work.

Fusion component data migration

The directory FUSION_HOME/data contains the on-disk data stores managed directly or indirectly by Fusion services.

  • FUSION_HOME/data/connectors contains data required by Fusion connectors.

    • FUSION_HOME/data/connectors/lucid.jdbc contains third-party JDBC driver files. If your application uses a JDBC connector, you must copy this information over to every server on which will this connector will run.

    • FUSION_HOME/data/connectors/crawldb contains information on the filed visited during a crawl. (Preserving crawldb history may not be possible if there are multiple different servers running Fusion connectors services.)

  • FUSION_HOME/data/nlp contains data used by Fusion NLP pipeline stages. If you are using Fusion’s NLP components for sentence detection, part-of-speech tagging, and named entity detection, you must copy over the model files stored under this directory.

  • FUSION_HOME/data/solr contains the backing store for Fusion’s embedded Solr (developer deployment only).

  • FUSION_HOME/data/zookeeper contains the backing store for Fusion’s embedded ZooKeeper (developer deployment only).

When migrating these directories, no Fusion services which may change the contents should be running. The choice of which directories to migrate and the utilities used to do the migration are entirely dependent upon the platform, environment, and deployment configurations.