Spark Jobs

Apache Spark can power a wide variety of data analysis jobs. In Fusion, Spark jobs are especially useful for generating recommendations.

Spark job subtypes

For the Spark job type, the available subtypes are listed below.

ALS Recommender

Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute user recommendations and item similarities.

Aggregation

Define an aggregation job to be executed by Fusion Spark.

Co-occurrence Similarity

Compute a mutual-information item similarity model.

Random Forest Classifier Training

Train a random forest classifier for text classification.

Script

Run a custom Scala script as a Fusion Job.

Matrix Decomposition-Based Query-Query Similarity Job

Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute query-query similarities.

Bisecting KMeans Clustering Job

Train a bisecting KMeans clustering model.

Logistic Regression Classifier Training Job

Trains a regularized logistic regression model for text classification.

Item Similarity Recommender

Compute user recommendations based on pre-computed item similarity model.

Spark job configuration

Spark jobs can be created and modified using the Fusion UI or the Spark Jobs API. They can be scheduled using the Fusion UI or the Jobs API.

To see the complete list of configuration parameters for all Spark job subtypes, use the /spark/schema endpoint:

curl -u user:pass http://localhost:8764/api/apollo/spark/schema