Spark Administration

Apache Spark is an open-source cluster-computing framework that serves as a fast and general execution engine for large-scale data processing jobs that can be decomposed into stepwise tasks, The tasks are distributed across a cluster of networked computers.

Spark improves on previous previous MapReduce implementations by using resilient distributed datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.

These topics provide information about Spark administration in Fusion Server:

Additionally, you can configure and run Spark jobs in Fusion, using the Spark Jobs API or the Fusion UI.

Spark with Fusion AI

With a Fusion AI license, you can also use the Spark cluster to train and compile machine learning models, as well as to run experiments via the Fusion UI or the Spark Jobs API.