Import Signals
Normally, signals are indexed as streaming data during the natural activity of users. This topic describes how to load historical signals data in batches, in Parquet format, using Spark shell.
Fusion’s performance may be affected during this resource-intensive operation. Be sure to allocate sufficient memory for the Spark, Solr, and connectors services. |
-
Customize the code below by replacing the following strings:
-
path_of_folder
. The absolute path to the folder containing your Parquet files. -
collection_name_signals
. The name of the signals collection where you want to load these signals. -
localhost:9983/lwfusion/4.2.2/solr
- You can verify the correct path by going to the Solr console athttp://fusion_host:8983/solr/#/
and looking for the value ofDzkHost
.
val parquetFilePath = "path_of_folder" val signals = spark.read.parquet(parquetFilePath) val collectionName = "collection_name_signals" val zkhostName = "localhost:9983/lwfusion/4.2.1/solr" var connectionMap = Map("collection" -> collectionName, "zkhost" -> zkhostName, "commit_within" -> "5000", "batch_size" -> "10000") signals.write.format("solr").options(connectionMap).save()
For information about
commit_within
andbatch_size
, see https://github.com/lucidworks/spark-solr#commit_within. -
-
Launch the Spark shell:
https://FUSION_HOST:FUSION_PORT/bin/spark-shell
-
At the
scala>
prompt, enter paste mode::paste
-
Paste your modified code from step 1.
-
Exit paste mode by pressing
CTRL-d
. -
When the operation is finished, navigate to Collections > Collections Manager to verify that the number of documents in the specified signals collection has increased as expected.