Apache Spark

Apache Spark is a fast and general execution engine for large-scale data processing jobs that can be decomposed into stepwise tasks which are distributed across a cluster of networked computers. Spark provides faster processing and better fault-tolerance than previous MapReduce implementations. The following schematic shows the Spark components available from Fuson:

Spark Processes in Fusion

Fusion 2.1 introcuced the use of a Spark cluster for all signal aggregation processes. This use of Spark is managed entirely by Fusion and is hidden from view of a Fusion application developer or end user.

As of Fusion 2.4, the Spark Jobs API provides a set of REST API endpoints for configuring and running Spark jobs via Fusion.

The section Spark and Machine Learning covers how to use use Fusion’s Spark cluster to run your own Spark jobs for custom aggregations and Machine Learning tasks.