Other Ingestion Methods

Usually, the simplest way to get data into Fusion is through its connectors. However, in some cases it makes sense to use other methods:

  • Use a Parallel Bulk Loader job

    Fusion Parallel Bulk Loader jobs enable bulk ingestion of structured and semi-structured data from big data systems, NoSQL databases, and common file formats like Parquet and Avro.

  • Import with the REST API

    You can use the REST API to bypass the connectors and parsers and push documents directly to an index profile or index pipeline.

  • Import via Pig

    You can use Pig to import data into Fusion, using the {packageUser}-pig-functions-{connectorVersion}.jar file found in $FUSION_HOME/apps/connectors/resources/lucid.hadoop/jobs.

  • Import via Hive

    Fusion ships with a Serializer/Deserializer (SerDe) for Hive, included in the distribution as {packageUser}-hive-serde-{connectorVersion}.jar in $FUSION_HOME/apps/connectors/resources/lucid.hadoop/jobs.

Note
The preferred method of importing data with Hive is to use the Parallel Bulk Loader.