Version 5.2


Table of Contents

These reference topics provide complete information about the configuration properties of jobs for which the subtype is "task" or "spark".

Jobs with the subtype "datasource" have configuration schemas that depend on the connector type; see Connectors Configuration Reference.

For conceptual information and instructions for configuring and scheduling jobs, see Jobs and Schedules.


  • Download Blob

  • Log Cleanup

    Unresolved directive in <stdin> - include::/fusion/reference/config-ref/jobs/log-cleanup.asciidoc[tag=intro]

  • REST Call

    A versatile job type that runs an arbitrary REST/HTTP/Solr command.

Spark jobs

  • Aggregation

    Unresolved directive in <stdin> - include::/fusion/reference/config-ref/jobs/aggregation.asciidoc[tag=intro]

  • Custom Python job.

    The Custom Python job provides user the ability to run Python code via Fusion. This job supports Python 3.6+ code.

  • Script

    Run a custom Scala script as a Fusion job.

  • ALS Recommender

    Use this job when you want to compute user recommendations or item similarities using a collaborative filtering recommender. You can also implement a user-to-item recommender in the advanced section of this job’s configuration UI. This job uses SparkML’s Alternating Least Squares (ALS).

Use the BPR Recommender instead.
  • Cluster Labeling

    Use this job when you already have clusters or well-defined document categories, and you want to discover and attach keywords to see representative words within those existing clusters. (If you want to create new clusters, use the Document Clustering job.)

  • Create Seldon Core Model Deployment Job

    Use this job to deploy a Seldon Core Model into the Fusion cluster.

  • Delete Seldon Core Model Deployment Job

    Use this job to remove a Seldon Core deployment from the Fusion cluster.

  • Document Clustering

    The Document Clustering job uses an unsupervised machine learning algorithm to group documents into clusters based on similarities in their content. You can enable more efficient document exploration by using these clusters as facets, high-level summaries or themes, or to recommend other documents from the same cluster. The job can automatically group similar documents in all kinds of content, such as clinical trials, legal documents, book reviews, blogs, scientific papers, and products.

  • Evaluate QnA Pipeline

    Evaluate the performance of a Smart Answers pipeline.

  • Ground Truth

    Estimate ground truth queries using click signals and query signals, with document relevance per query determined using a click/skip formula.

  • Head/Tail Analysis

    Perform head/tail analysis of queries from collections of raw or aggregated signals, to identify underperforming queries and the reasons. This information is valuable for improving overall conversions, Solr configurations, auto-suggest, product catalogs, and SEO/SEM strategies, in order to improve conversion rates.

  • Logistic Regression Classifier Training

    Train a regularized logistic regression model for text classification.

The Classification job, introduced in Fusion 5.2.0, provides more options and better logging.
The Classification job, introduced in Fusion 5.2.0, provides more options and better logging.
The Classification job, introduced in Fusion 5.2.0, provides more options and better logging.
  • Ranking Metrics

    Calculate relevance metrics (nDCG and so on) by replaying ground truth queries against catalog data using variants from an experiment.

  • SQL Aggregation

    A Spark SQL aggregation job where user-defined parameters are injected into a built-in SQL template at runtime.

  • Synonym Detection Jobs

    Use this job to generate pairs of synonyms and pairs of similar queries. Two words are considered potential synonyms when they are used in a similar context in similar queries.

  • Token and Phrase Spell Correction

    Detect misspellings in queries or documents using the numbers of occurrences of words and phrases.

  • Word2Vec Model Training

    Train a shallow neural model, and project each document onto this vector embedding space.

Word2Vec Model Training job is deprecated as of Fusion 5.2.0.