Migrating Fusion Data
- Migrating collections and related configurations
- Migrating application configuration data
The instructions in this topic can be used to migrate Fusion data from development environments into testing and production environments, or to back up data and restore it after an incident of data loss.
-
Collections and related configurations can be migrated using the Objects API and the Fusion UI (import only). Fusion objects include all your searchable data, plus pipelines, aggregations, and other configurations on which your collections depend.
-
Application configuration data includes
Migrating collections and related configurations
Fusion allows you to export objects from one Fusion instance and import them into another. The data that you can migrate includes collections and all collection-related configurations.
Exporting can only be performed using the Objects API. Importing can be performed using the API or the UI.
Object export and import
Collections and encrypted values are treated specially; details are provided below. During import, conflicts are resolved according to the specified import policy.
For objects other than collections, no implicit filtering is performed; all objects are included by default. However, on export you can filter by type and ID.
Supported objects
Fusion lets you export and import these types of objects:
-
collection
-
index-pipeline
-
query-pipeline
-
search-cluster
-
datasource
-
banana
-
parser
-
group
-
link
-
task
-
job
-
spark
Exporting and importing collections
Collections are processed with these dependent objects:
-
features
-
index profiles
-
query profiles
Datasources, parser configurations, and pipeline configurations are not included when collections are exported or imported. These must be exported and imported explicitly.
Only user-created collections are included by default. Certain types of collections are excluded:
-
the "default" collection
-
collections whose type is not DATA
-
collections whose names start with "system_"
-
"Secondary" collections, that is, collections created by features
Instead, create the same features on the target system; this automatically creates the corresponding secondary collections.
You can override these exclusions by specifying a collection, like this:
http://localhost:8764/api/apollo/objects/export?collection.ids=default
Encrypted passwords
Some objects, such as datasources and pipelines, include encrypted passwords for accessing remote data.
-
On export, these encrypted values are replaced with
${secret.n.nameOfProperty}
. -
On import, the original, plaintext passwords must be provided in a JSON map:
{"secret.1.bindPassword" : "abc", "secret.2.bindPassword" : "def"}
The file must be supplied as multipart form data.
Note
|
Variables that do not start with secret. are ignored.
|
Import policies
On import, the importPolicy
parameter is required. It specifies what to do if any object in the import list already exists on the target system:
|
If there are conflicts, then import nothing. |
|
If there are conflicts, then skip the conflicting objects. |
|
If there are conflicts, then overwrite or delete/create the conflicting objects on the target system. |
Filtering on export
On export, there are two ways to specify the objects to include:
-
by type
You can specify a list of object types to export all objects of those types. Valid values:
-
collection
-
index-pipeline
-
query-pipeline
-
search-cluster
-
datasource
-
banana
-
parser
-
group
-
link
-
task
-
job
-
spark
-
-
by type and ID
The
type.ids
parameter lets you list the IDs to match for the specified object type.
The type
and type.ids
parameters can be combined as needed.
Exporting linked objects
Related Fusion objects are linked. You can view linked objects using the Links API or the Object Explorer.
When exporting a specific Fusion object, you can also export its linked objects without specifying each one individually. To export all objects linked to the specified object, include the deep="true"
query parameter in your request. See the example below. When deep
is "true", Fusion follows these link types:
-
DependsOn
-
HasPart
-
RelatesTo
Validation
Objects are validated before import. If any objects fail validation, the whole import request is rejected. A separate endpoint is available for validating objects without importing them.
Validation includes checking whether an object already exists on the target system and whether the user is authorized to create or modify the object.
For collection objects, the following special validation is performed:
-
We check the
searchClusterId
of each collection and verify that a cluster with this ID exists on the target system or in the import file (error). -
We check that features, index profiles, and query profiles belong only to the collections specified in the import file (error).
-
We check that a feature exists on the target system for each feature in the import file (error).
-
We check for index profiles or query profiles that do not exist on the target system or in the import file (warning).
For job
objects, which contain schedule configurations, Fusion only imports them if their associated task
, datasource
, or spark
objects are also present, either on the target host or in the import file.
Status messages
|
The validation method was called and no errors found, though there may be warnings. |
|
The validation was called and errors found. Validation does not stop on the first error, so the complete list of errors is reported. |
|
The validation was interrupted by system error. |
|
The import method was called, but import didn’t start because of validation errors. |
|
The import method was called, but import didn’t start, because Fusion could not find a substitution for one of the secret values in objects in import. |
|
The validation found no errors and import started, but it was interrupted by system error. |
|
Validation found no errors and import finished successfully. |
How to export Fusion objects
Exporting can only be performed using the Objects API.
You can select all objects, or limit the operation to specific object types or IDs. In addition to export endpoints, a validation endpoint is provided for troubleshooting.
Note
|
By default, system-created collections are not exported. |
Some example requests are shown below. For complete reference information about object export endpoints, see the Objects API.
curl -u user:pass http://localhost:8764/api/apollo/objects/export
curl -u user:pass http://localhost:8764/api/apollo/objects/export?type=datasource
curl -u user:pass http://localhost:8764/api/apollo/objects/export?export?datasource.ids=movies_csv-ml-movies&deep=true
curl -u user:pass http://localhost:8764/api/apollo/objects/export?type=datasource,index-pipeline,query-pipeline&parser.ids=cinema_parser,metafiles_parser
How to import Fusion objects
Objects can be imported using the REST API or the Fusion UI.
Importing objects with the REST API
Some example requests are shown below. For complete reference information about object export endpoints, see the Objects API.
curl -u user:pass -H "Content-Type:multipart/form-data" -X POST -F 'importData=@/Users/admin/Fusion/export.json' http://localhost:8764/api/apollo/objects/import?importPolicy=abort
curl -u user:pass -H "Content-Type:multipart/form-data" -X POST -F 'importData=@/Users/admin/Fusion/export.json' -F 'variableValues=@password_file.json' http://localhost:8764/api/apollo/objects/import?importPolicy=merge
Note
|
password_file.json must contain plaintext passwords.
|
Importing objects with the Fusion UI
-
In the upper left, click the Launcher button and select Devops.
-
In the Home panel, click Import Fusion Objects.
The Import Fusion Objects window opens.
-
Select the data file from your local filesystem.
If you are importing passwords, also select the JSON file that maps variables to plaintext passwords.
-
Click Import.
If there are conflicts, Fusion prompts you to specify an import policy:
-
Click Overwrite to overwrite the objects on the target system with the ones in the import file.
-
Click Merge to skip all conflicting objects and import only the non-conflicting objects.
-
Click Start Over to abort the import.
Fusion confirms that the import was successful:
-
-
Click Close to close the Import Fusion Objects window.
Migrating application configuration data
ZooKeeper configuration data is used to coordinate a distributed Fusion deployment. Additionally, certain Fusion components have configuration data that can be migrated between Fusion instances.
Migrating ZooKeeper data
Migration consists of the following steps:
-
Copy the ZooKeeper data nodes which contain Fusion configuration information from the FUSION-CURRENT ZooKeeper instance to the FUSION-NEW ZooKeeper instance
-
Rewrite Fusion datasource and pipeline configurations, working against the FUSION-NEW ZooKeeper instance
From ZooKeeper to JSON file
To export configurations from an existing Fusion deployment, the script zkImportExport.sh
requires parameters:
-
-cmd export
- this is the command parameter which specifies the mode in which to run this program. -
-zkhost <connect string>
- the ZooKeeper connect string is the list of all servers,ports for the FUSION_CURRENT ZooKeeper cluster. For example, if running a single-node Fusion developer deployment with embedded ZooKeeper, the connect string islocalhost:9983
. If you have an external 3-node ZooKeeper cluster running on servers "zk1.acme.com", "zk2.acme.com", "zk3.acme.com", all listening on port 2181, then the connect string iszk1.acme.com:2181,zk2.acme.com:2181,zk3.acme.com:2181
-
-filename <path/to/JSON/dump/file>
- the name of the JSON dump file to save to. -
-path <start znode>
-
To migrate Fusion configurations for all applications, the path is "/lucid". Migrating just the "lucid" node between the ZooKeeper services used by different Fusion deployments results in deployments which contain the same applications but not the same user databases.
-
To migrate the Fusion users, groups, roles, and realms information, the path is "/lucid-apollo-admin".
-
To migrate all ZooKeeper data, the path is "/".
-
Example: export from local developer deployment to file "znode_lucid_dump.json"
> {fusion_path}/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd export -path /lucid -filename znode_lucid_dump.json
The command products the following terminal outputs:
2016-06-01T19:48:12,512 - INFO [main:URLConfigurationSource@125] - URLs to be used as dynamic configuration source: [jar:file:/Users/demo/tmp5/fusion/apps/jetty/api/webapps/api/WEB-INF/lib/lucid-base-spark-2.2.0.jar!/config.properties] 2016-06-01T19:48:12,878 - INFO [main:DynamicPropertyFactory@281] - DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration@5bf22f18 2016-06-01T19:48:12,961 - INFO [main:CloseableRegistry@45] - Registering a new closeable: org.apache.curator.framework.imps.CuratorFrameworkImpl@32fe9d0a 2016-06-01T19:48:12,961 - INFO [main:CuratorFrameworkImpl@234] - Starting 2016-06-01T19:48:12,974 - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2016-06-01T19:48:12,974 - INFO [main:Environment@100] - Client environment:host.name=10.0.1.16 2016-06-01T19:48:12,974 - INFO [main:Environment@100] - Client environment:java.version=1.8.0_25 2016-06-01T19:48:12,974 - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2016-06-01T19:48:12,975 - INFO [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre 2016-06-01T19:48:12,975 - INFO [main:Environment@100] - Client environment:java.class.path=./fusion/scripts/.. ... ( rest of path omitted ) 2016-06-01T19:48:12,976 - INFO [main:Environment@100] - Client environment:java.library.path=/Users/demo/Library/Java/Extensions: ... ( rest of path omitted ) 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/var/folders/jq/ms_hc8f9269f4h8k4b691d740000gp/T/ 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:os.name=Mac OS X 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:os.arch=x86_64 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:os.version=10.10.5 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:user.name=demo 2016-06-01T19:48:12,977 - INFO [main:Environment@100] - Client environment:user.home=/Users/demo 2016-06-01T19:48:12,978 - INFO [main:Environment@100] - Client environment:user.dir=/Users/demo/tmp5 2016-06-01T19:48:12,978 - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:9983 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@138fe6ec 2016-06-01T19:48:18,070 - INFO [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@975] - Opening socket connection to server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983. Will not attempt to authenticate using SASL (unknown error) 2016-06-01T19:48:18,111 - INFO [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@852] - Socket connection established to fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983, initiating session 2016-06-01T19:48:18,118 - INFO [main-SendThread(fe80:0:0:0:0:0:0:1%1:9983):ClientCnxn$SendThread@1235] - Session establishment complete on server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:9983, sessionid = 0x1550df6b0180017, negotiated timeout = 40000 2016-06-01T19:48:18,121 - INFO [main-EventThread:ConnectionStateManager@228] - State change: CONNECTED 2016-06-01T19:48:18,367 - INFO [main:ZKImportExportCli@198] - Data written to file '/Users/demo/tmp5/znode_lucid_dump.json' 2016-06-01T19:48:18,370 - INFO [main:ZooKeeper@684] - Session: 0x1550df6b0180017 closed 2016-06-01T19:48:18,370 - INFO [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down
The resulting JSON output file contains the znode hierarchy for znode "lucid", with ZooKeeper binary data:
{ "request" : { "timestamp" : "2016-06-01T19:48:13.001-04:00", "params" : { "zkHost" : "localhost:9983", "path" : "/lucid", "encodeValues" : "base64", "recursive" : true, "ephemeral" : false } }, "response" : { "path" : "/lucid", "children" : [ { "path" : "/lucid/conf-default", "children" : [ { "path" : "/lucid/conf-default/fusion.spark.driver.jar.exclusions", "data" : "LipvcmcuYXBhY2hlLnNwYXJrLiosLipvcmcuc3BhcmstcHJvamVjdC4qLC4qb3JnLmFwYWNoZS5oYWRvb3AuKiwuKnNwYXJrLWFzc2VtYmx5LiosLipzcGFyay1uZXR3b3JrLiosLipzcGFyay1leGFtcGxlcy4qLC4qXFwvaGFkb29wLS4qLC4qXFwvdGFjaHlvbi4qLC4qXFwvZGF0YW51Y2xldXMuKg==" }, { ...
The size and number of lines in this file will vary depending on the number, complexity, and job histories stored in ZooKeeper.
From JSON file to ZooKeeper - migration scenarios
The following examples show how to run this script in different situations.
When uploading configurations to Fusion, only the Fusion ZooKeeper service should be running.
New application, new Fusion deployment
When migrating data to a fresh installation of Fusion, the exported configurations are uploaded
using the script command argument -cmd import
.
import command example:
> {fusion_path}/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd import -path /lucid -filename znode_lucid_dump.json
This command will fail if the "lucid" znode in this Fusion deployment contains configuration definitions that are in conflict with the exported data.
To verify, start all Fusion services and log in to the new Fusion installation. As this is the initial install, the Fusion UI will display the "set admin password" panel. Once you have set the admin password, verify that this installation contains the same set of collections and datasources as the existing collection.
New application, existing Fusion deployment
When migrating a new application to a Fusion deployment which is already configured with other applications,
the exported configurations should be uploaded using the script command argument -cmd update
.
update command example:
> {fusion_path}/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd update -path /lucid -filename znode_lucid_dump.json
To verify, start all Fusion services and log in to the new Fusion installation and verify that this installation contains the same set of collections and datasources as the existing collection, and that all Fusion pipelines and stages match those of the existing Fusion installation.
Existing application, existing Fusion deployment
When migrating an existing application to a Fusion deployment which is already running a version of that application,
the exported configurations should be uploaded using the script command argument -cmd update --overwrite
.
update --overwrite command example:
> {fusion_path}/scripts/zkImportExport.sh -zkhost localhost:9983 -cmd update --override -path /lucid -filename znode_lucid_dump.json
To verify, start all Fusion services and log in to the new Fusion installation and verify that this installation contains the same set of collections and datasources as the existing collection, and that all Fusion pipelines and stages match those of the existing Fusion installation.
Caveats
-
All datasource configurations are copied over as is. If the set of repositories used to populate the collections changes according to deployment environment, then these datasources will need to be updated accordingly.
-
The import export script is only guaranteed to work between Fusion deployments running the same Fusion version. The should work across all releases for the same Major.minor version of Fusion, e.g. you should be able to migrate between versions 2.4.1 and 2.4.2. If the set of configurations needed for an application have the same structure and properties across two different versions, these scripts might work.
Migrating Fusion component configuration data
The directory fusion/3.1.x/data
contains the on-disk data stores
managed directly or indirectly by Fusion services.
-
fusion/3.1.x/data/connectors
contains data required by Fusion connectors.-
fusion/3.1.x/data/connectors/lucid.jdbc
contains third-party JDBC driver files. If your application uses a JDBC connector, you must copy this information over to every server on which will this connector will run. -
fusion/3.1.x/data/connectors/crawldb
contains information on the filed visited during a crawl. (Preserving crawldb history may not be possible if there are multiple different servers running Fusion connectors services.)
-
-
fusion/3.1.x/data/nlp
contains data used by Fusion NLP pipeline stages. If you are using Fusion’s NLP components for sentence detection, part-of-speech tagging, and named entity detection, you must copy over the model files stored under this directory. -
fusion/3.1.x/data/solr
contains the backing store for Fusion’s embedded Solr (developer deployment only). -
fusion/3.1.x/data/zookeeper
contains the backing store for Fusion’s embedded ZooKeeper (developer deployment only).
When migrating these directories, no Fusion services which may change the contents should be running. The choice of which directories to migrate and the utilities used to do the migration are entirely dependent upon the platform, environment, and deployment configurations.