Migrating Fusion Data

The instructions in this topic can be used to migrate Fusion data from development environments into testing and production environments, or to back up data and restore it after an incident of data loss.

  • Collections and related configurations can be migrated using the Objects API and the Fusion UI (import only). Fusion objects include all your searchable data, plus pipelines, aggregations, and other configurations on which your collections depend.

  • Application configuration data includes

Fusion allows you to export objects from one Fusion instance and import them into another. The data that you can migrate includes collections and all collection-related configurations.

Exporting can only be performed using the Objects API. Importing can be performed using the API or the UI.

Object export and import

Collections and encrypted values are treated specially; details are provided below. During import, conflicts are resolved according to the specified import policy.

For objects other than collections, no implicit filtering is performed; all objects are included by default. However, on export you can filter by type and ID.

Supported objects

Fusion lets you export and import these types of objects:

  • collections (with their dependent objects)

    See below for collection-specific details.

  • index pipelines

  • query pipelines

  • search clusters

  • schedules

  • aggregations

  • datasources

  • dashboards (banana)

  • parsing configurations

Exporting and importing collections

Collections are processed with these dependent objects:

  • features

  • index profiles

  • query profiles

Datasources, parser configurations, and pipeline configurations are not included when collections are exported or imported. These must be exported and imported explicitly.

Only user-created collections are included by default. Certain types of collections are excluded:

  • the "default" collection

  • collections whose type is not DATA

  • collections whose names start with "system_"

  • "Secondary" collections, that is, collections created by features

    Instead, create the same features on the target system; this automatically creates the corresponding secondary collections.

You can override these exclusions by specifying a collection, like this:

http://localhost:8764/api/apollo/objects/export?collection.ids=default

Encrypted passwords

Some objects, such as datasources and pipelines, include encrypted passwords for accessing remote data.

  • On export, these encrypted values are replaced with ${secret.n.nameOfProperty}.

  • On import, the original, plaintext passwords must be provided in a JSON map:

    {"secret.1.bindPassword" : "abc", "secret.2.bindPassword" : "def"}

    The file must be supplied as multipart form data.

Note
Variables that do not start with secret. are ignored.

Import policies

On import, the importPolicy parameter is required. It specifies what to do if any object in the import list already exists on the target system:

abort

If there are conflicts, then import nothing.

merge

If there are conflicts, then skip the conflicting objects.

overwrite

If there are conflicts, then overwrite or delete/create the conflicting objects on the target system.

Filtering on export

On export, there are two ways to specify the objects to include:

  • by type

    You can specify a list of object types to export all objects of those types. Valid values:

    • collection

    • index-pipeline

    • query-pipeline

    • search-cluster

    • schedule

    • aggregation

    • datasource

    • banana

    • parser

  • by type and ID

    The type.ids parameter lets you list the IDs to match for the specified object type.

The type and type.ids parameters can be combined as needed.

Validation

Objects are validated before import. If any objects fail validation, the whole import request is rejected. A separate endpoint is available for validating objects without importing them.

Validation includes checking whether an object already exists on the target system and whether the user is authorized to create or modify the object.

For collection objects, the following special validation is performed:

  • We check the searchClusterId of each collection and verify that a cluster with this ID exists on the target system or in the import file (error).

  • We check that features, index profiles, and query profiles belong only to the collections specified in the import file (error).

  • We check that a feature exists on the target system for each feature in the import file (error).

  • We check for index profiles or query profiles that do not exist on the target system or in the import file (warning).

Status messages

Validation completed with no errors

The validation method was called and no errors found, though there may be warnings.

Validation found errors

The validation was called and errors found. Validation does not stop on the first error, so the complete list of errors is reported.

Validation was not completed because of system error

The validation was interrupted by system error.

Import was not performed because validation errors exist

The import method was called, but import didn’t start because of validation errors.

Import was not performed because of input data error

The import method was called, but import didn’t start, because Fusion could not find a substitution for one of the secret values in objects in import.

Import was not completed because of system error

The validation found no errors and import started, but it was interrupted by system error.

Import was completed

Validation found no errors and import finished successfully.

How to export Fusion objects

Exporting can only be performed using the Objects API.

You can select all objects, or limit the operation to specific object types or IDs. In addition to export endpoints, a validation endpoint is provided for troubleshooting.

Note
By default, system-created collections are not exported.

Some example requests are shown below. For complete reference information about object export endpoints, see the Objects API.

Export all objects
http://localhost:8764/api/apollo/objects/export
Export all schedules
http://localhost:8764/api/apollo/objects/export?type=schedule
Export all datasources and pipelines, plus two specific parsing configurations
http://localhost:8764/api/apollo/objects/export?type=datasource,index-pipeline,query-pipeline&parser.ids=cinema_parser,metafiles_parser

How to import Fusion objects

Objects can be imported using the REST API or the Fusion UI.

Importing objects with the REST API

Some example requests are shown below. For complete reference information about object export endpoints, see the Objects API.

Import objects from a file and stop if there are conflicts
curl -u admin:password123 -H "Content-Type:multipart/form-data" -X POST -F 'importData=@/Users/admin/Fusion/export.json' http://localhost:8764/api/apollo/objects/import?importPolicy=abort
Import objects, substitute the password variables, and merge any conflicts
curl -u admin:password123 -H "Content-Type:multipart/form-data" -X POST -F 'importData=@/Users/admin/Fusion/export.json' -F 'variableValues=@password_file.json' http://localhost:8764/api/apollo/objects/import?importPolicy=merge
Note
password_file.json must contain plaintext passwords.

Importing objects with the Fusion UI

How to import objects using the UI
  1. launcher menu devops In the upper left, click the Launcher button and select Devops.

  2. In the Home panel, click Import Fusion Objects.

    The Import Fusion Objects window opens.

  3. Select the data file from your local filesystem.

    If you are importing passwords, also select the JSON file that maps variables to plaintext passwords.

    import objects

  4. Click Import.

    If there are conflicts, Fusion prompts you to specify an import policy:

    import objects2

    • Click Overwrite to overwrite the objects on the target system with the ones in the import file.

    • Click Merge to skip all conflicting objects and import only the non-conflicting objects.

    • Click Start Over to abort the import.

    Fusion confirms that the import was successful:

    import objects3

  5. Click Close to close the Import Fusion Objects window.

Migrating application configuration data

ZooKeeper configuration data is used to coordinate a distributed Fusion deployment. Additionally, certain Fusion components have configuration data that can be migrated between Fusion instances.

Migrating ZooKeeper data

Migration consists of the following steps:

  • Copy the ZooKeeper data nodes which contain Fusion configuration information from the FUSION-CURRENT ZooKeeper instance to the FUSION-NEW ZooKeeper instance

  • Rewrite Fusion datasource and pipeline configurations, working against the FUSION-NEW ZooKeeper instance

From ZooKeeper to JSON file

To export configurations from an existing Fusion install, the script zkImportExport.sh requires parameters:

  • -cmd export - this is the command parameter which specifies the mode in which to run this program.

  • -zkhost <connect string> - the ZooKeeper connect string is the list of all servers,ports for the FUSION_CURRENT ZooKeeper cluster. For example, if running a single-node Fusion developer deployment with embedded ZooKeeper, the connect string is localhost:9983. If you have an external 3-node ZooKeeper cluster running on servers "zk1.acme.com", "zk2.acme.com", "zk3.acme.com", all listening on port 2181, then the connect string is zk1.acme.com:2181,zk2.acme.com:2181,zk3.acme.com:2181

  • -filename <path/to/JSON/dump/file> - the name of the JSON dump file to save to.

  • -path <start znode>

    • To migrate Fusion configurations for all applications, the path is "/lucid". Migrating just the "lucid" node between the ZooKeeper services used by different Fusion deployments results in deployments which contain the same applications but not the same user databases.

    • To migrate the Fusion users, groups, roles, and realms information, the path is "/lucid-apollo-admin".

    • To migrate all ZooKeeper data, the path is "/".

Example: export from local developer deployment to file "znode_lucid_dump.json"

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd export -path /lucid -filename znode_lucid_dump.json

The command products the following terminal outputs:

2016-06-01T19:48:12,512 - INFO  [main:URLConfigurationSource@125] - URLs to be used as dynamic configuration source: [jar:file:/Users/demo/tmp5/fusion/apps/jetty/api/webapps/api/WEB-INF/lib/lucid-base-spark-2.2.0.jar!/config.properties]
2016-06-01T19:48:12,878 - INFO  [main:DynamicPropertyFactory@281] - DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration@5bf22f18
2016-06-01T19:48:12,961 - INFO  [main:CloseableRegistry@45] - Registering a new closeable: org.apache.curator.framework.imps.CuratorFrameworkImpl@32fe9d0a
2016-06-01T19:48:12,961 - INFO  [main:CuratorFrameworkImpl@234] - Starting
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:host.name=10.0.1.16
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_25
2016-06-01T19:48:12,974 - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-06-01T19:48:12,975 - INFO  [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre
2016-06-01T19:48:12,975 - INFO  [main:Environment@100] - Client environment:java.class.path=./fusion/scripts/..  ... ( rest of path omitted )
2016-06-01T19:48:12,976 - INFO  [main:Environment@100] - Client environment:java.library.path=/Users/demo/Library/Java/Extensions: ... ( rest of path omitted )
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/var/folders/jq/ms_hc8f9269f4h8k4b691d740000gp/T/
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:os.name=Mac OS X
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:os.arch=x86_64
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:os.version=10.10.5
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:user.name=demo
2016-06-01T19:48:12,977 - INFO  [main:Environment@100] - Client environment:user.home=/Users/demo
2016-06-01T19:48:12,978 - INFO  [main:Environment@100] - Client environment:user.dir=/Users/demo/tmp5
2016-06-01T19:48:12,978 - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:9983 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@138fe6ec
2016-06-01T19:48:18,070 - INFO  [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@975] - Opening socket connection to server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983. Will not attempt to authenticate using SASL (unknown error)
2016-06-01T19:48:18,111 - INFO  [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@852] - Socket connection established to fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983, initiating session
2016-06-01T19:48:18,118 - INFO  [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@1235] - Session establishment complete on server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983, sessionid = 0x1550df6b0180017, negotiated timeout = 40000
2016-06-01T19:48:18,121 - INFO  [main-EventThread:ConnectionStateManager@228] - State change: CONNECTED
2016-06-01T19:48:18,367 - INFO  [main:ZKImportExportCli@198] - Data written to file '/Users/demo/tmp5/znode_lucid_dump.json'
2016-06-01T19:48:18,370 - INFO  [main:ZooKeeper@684] - Session: 0x1550df6b0180017 closed
2016-06-01T19:48:18,370 - INFO  [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down

The resulting JSON output file contains the znode hierarchy for znode "lucid", with ZooKeeper binary data:

{
  "request" : {
    "timestamp" : "2016-06-01T19:48:13.001-04:00",
    "params" : {
      "zkHost" : "localhost:9983",
      "path" : "/lucid",
      "encodeValues" : "base64",
      "recursive" : true,
      "ephemeral" : false
    }
  },
  "response" : {
    "path" : "/lucid",
    "children" : [ {
      "path" : "/lucid/conf-default",
      "children" : [ {
        "path" : "/lucid/conf-default/fusion.spark.driver.jar.exclusions",
        "data" : "LipvcmcuYXBhY2hlLnNwYXJrLiosLipvcmcuc3BhcmstcHJvamVjdC4qLC4qb3JnLmFwYWNoZS5oYWRvb3AuKiwuKnNwYXJrLWFzc2VtYmx5LiosLipzcGFyay1uZXR3b3JrLiosLipzcGFyay1leGFtcGxlcy4qLC4qXFwvaGFkb29wLS4qLC4qXFwvdGFjaHlvbi4qLC4qXFwvZGF0YW51Y2xldXMuKg=="
      }, {
 ...

The size and number of lines in this file will vary depending on the number, complexity, and job histories stored in ZooKeeper.

From JSON file to ZooKeeper - migration scenarios

The following examples show how to run this script in different situations.

When uploading configurations to Fusion, only the Fusion ZooKeeper service should be running.

New application, new Fusion deployment

When migrating data to a fresh install of Fusion, the exported configurations are uploaded using the script command argument -cmd import.

import command example:

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd import -path /lucid -filename znode_lucid_dump.json

This command will fail if the "lucid" znode in this Fusion install contains configuration definitions which are in conflict with the exported data.

To verify, start all Fusion services and login to the new Fusion installation. As this is the initial install, the Fusion UI will display the "set admin password" panel. Once you have set the admin password, verify that this install contains the same set of collections and datasources as the existing collection.

New application, existing Fusion deployment

When migrating a new application to a Fusion deployment which is already configured with other applications, the exported configurations should be uploaded using the script command argument -cmd update.

update command example:

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd update -path /lucid -filename znode_lucid_dump.json

To verify, start all Fusion services and login to the new Fusion installation and verify that this install contains the same set of collections and datasources as the existing collection, and that all Fusion pipelines and stages match those of the existing Fusion installation.

Existing application, existing Fusion deployment

When migrating an existing application to a Fusion deployment which is already running a version of that application, the exported configurations should be uploaded using the script command argument -cmd update --overwrite.

update --overwrite command example:

> $FUSION-HOME/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd update --override -path /lucid -filename znode_lucid_dump.json

To verify, start all Fusion services and login to the new Fusion installation and verify that this install contains the same set of collections and datasources as the existing collection, and that all Fusion pipelines and stages match those of the existing Fusion installation.

Caveats

  • All datasource configurations are copied over as is. If the set of repositories used to populate the collections changes according to deployment environment, then these datasources will need to be updated accordingly.

  • The import export script is only guaranteed to work between Fusion deployments running the same Fusion version. The should work across all releases for the same Major.minor version of Fusion, e.g. you should be able to migrate between versions 2.4.1 and 2.4.2. If the set of configurations needed for an application have the same structure and properties across two different versions, these scripts might work.

Migrating Fusion component configuration data

The directory FUSION_HOME/data contains the on-disk data stores managed directly or indirectly by Fusion services.

  • FUSION_HOME/data/connectors contains data required by Fusion connectors.

    • FUSION_HOME/data/connectors/lucid.jdbc contains third-party JDBC driver files. If your application uses a JDBC connector, you must copy this information over to every server on which will this connector will run.

    • FUSION_HOME/data/connectors/crawldb contains information on the filed visited during a crawl. (Preserving crawldb history may not be possible if there are multiple different servers running Fusion connectors services.)

  • FUSION_HOME/data/nlp contains data used by Fusion NLP pipeline stages. If you are using Fusion’s NLP components for sentence detection, part-of-speech tagging, and named entity detection, you must copy over the model files stored under this directory.

  • FUSION_HOME/data/solr contains the backing store for Fusion’s embedded Solr (developer deployment only).

  • FUSION_HOME/data/zookeeper contains the backing store for Fusion’s embedded ZooKeeper (developer deployment only).

When migrating these directories, no Fusion services which may change the contents should be running. The choice of which directories to migrate and the utilities used to do the migration are entirely dependent upon the platform, environment, and deployment configurations.