Import Solr Collections with the Parallel Bulk Loader
You can use the Parallel Bulk Loader to copy Solr collections from one collection to another. This is helpful for copying collections from a production environment to a testing or development environment and using real data in the development and testing process.
Parallel Bulk Loader Job Configuration
In the Fusion UI:
-
Navigate to Collections > Jobs.
-
Click Add and select Parallel Bulk Loader from the menu.
-
Enter a name for your job, and enter
solr
as the format. -
Set the parameter name and value for the collection you want to import.
-
Parameter name:
collection
-
Parameter value: the name of the collection to import from
-
-
If the source collection is in a different Fusion app, add an additional parameter name and value pair.
-
Parameter name:
zkHost
-
Parameter value: the location of zkHost
-
-
Enter the output collection, or where you want to import the collection to.
-
Output collection: the name of the output collection
-
Send to index pipeline:
_system
-
-
Save your job.
When you’re ready, run the job or schedule the job to run at an interval of your choice.
Parallel Bulk Loader JSON Configuration
Use this sample JSON payload to load the Parallel Bulk Loader to the Spark Jobs API:
{
"type" : "parallel-bulk-loader",
"id" : "copy_sigs_with_timestamp",
"format" : "solr",
"readOptions" : [ {
"key" : "collection",
"value" : "SOURCE_COLLECTION"
}, {
"key" : "query",
"value" : "timestamp_tdt:{$lastTimestamp*}* TO *]"
} ],
"outputCollection" : "OUTPUT_COLLECTION",
"timestampFieldName" : "timestamp_tdt",
"defineFieldsUsingInputSchema" : true
}
Replace SOURCE_COLLECTION
with the name of the collection being exported. Replace OUTPUT_COLLECTION
with the name of the destination collection.
See Parallel Bulk Loader for full configuration options.
Use the following Spark Jobs API call to run the job. Replace data.json
with your JSON payload file.
curl -X POST \
-u USERNAME:PASSWORD \
'https://FUSION_HOST:6764/api/apps/Documentation/spark/configurations' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d 'data.json'