Jobs Configuration Reference

These reference topics provide complete information about configuration properties for the Spark jobs that are enabled with a Fusion AI license.

For conceptual information and instructions for configuring and scheduling jobs, see Jobs and Schedules.

Additional jobs are available as part of the basic Fusion Server feature set.

  • ALS Recommender

    Use this job when you want to compute user recommendations or item similarities using a collaborative filtering recommender. You can also implement a user-to-item recommender in the advanced section of this job’s configuration UI. This job uses SparkML’s Alternating Least Squares (ALS).

    Note
    This job is deprecated as of Fusion 5.2.0. Use the BPR Recommender instead.
  • BPR Recommender

    Use this job when you want to compute user recommendations or item similarities using a Bayesian Personalized Ranking (BPR) recommender algorithm.

  • Classification

    This job analyzes how your existing documents or signals are categorized and produces a classification model that can be used to predict the categories of new documents or queries.

  • Cluster Labeling

    Use this job when you already have clusters or well-defined document categories, and you want to discover and attach keywords to see representative words within those existing clusters. (If you want to create new clusters, use the Document Clustering job.)

  • Content-Based Recommender

    Use this job when you want to compute item similarities based on their content, such as product descriptions.

  • Create Seldon Core Model Deployment Job

    Use this job to deploy a Seldon Core Model into the Fusion cluster.

  • Delete Seldon Core Model Deployment Job

    Use this job to remove a Seldon Core deployment from the Fusion cluster.

  • Document Clustering

    Cluster a set of documents and attach cluster labels.

  • Evaluate QnA Pipeline

    Evaluate the performance of a Smart Answers pipeline.

  • Ground Truth

    Estimate ground truth queries using click signals and query signals, with document relevance per query determined using a click/skip formula.

  • Head/Tail Analysis

    Perform head/tail analysis of queries from collections of raw or aggregated signals, to identify underperforming queries and the reasons. This information is valuable for improving overall conversions, Solr configurations, auto-suggest, product catalogs, and SEO/SEM strategies, in order to improve conversion rates.

  • Legacy Item Recommender

    Compute user recommendations based on a pre-computed item similarity model.

  • Legacy Item Similarity

    Use this job when you only want to compute item-to-item similarities. This method is more lightweight than the generic Recommendations job.

  • Logistic Regression Classifier Training

    Train a regularized logistic regression model for text classification.

  • Outlier Detection

    Use this job when you want to find outliers from a set of documents and attach labels for each outlier group.

  • Parallel Bulk Loader

    The Parallel Bulk Loader (PBL) job enables bulk ingestion of structured and semi-structured data from big data systems, NoSQL databases, and common file formats like Parquet and Avro.

  • Phrase Extraction

    Identify multi-word phrases in signals.

  • QnA Supervised Training

    Train a Smart Answers model on a supervised basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.

  • QnA Coldstart Training

    Train a Smart Answers model on a cold start (unsupervised) basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.

  • Query-to-Query Session-Based Similarity

    Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute query-query similarities. This can be used for items-for-query recommendations as well as queries-for-query recommendations.

  • Random Forest Classifier Training

    Train a random forest classifier for text classification.

    Note
    This job is deprecated as of Fusion 5.2.0.
  • Ranking Metrics

    Calculate relevance metrics (nDCG and so on) by replaying ground truth queries against catalog data using variants from an experiment.

  • SQL Aggregation

    A Spark SQL aggregation job where user-defined parameters are injected into a built-in SQL template at runtime.

  • Synonym Detection Jobs

    Use this job to generate pairs of synonyms and pairs of similar queries. Two words are considered potential synonyms when they are used in a similar context in similar queries.

  • Token and Phrase Spell Correction

    Detect misspellings in queries or documents using the numbers of occurrences of words and phrases.

  • Word2Vec Model Training

    Train a shallow neural model, and project each document onto this vector embedding space.

    Note
    This job is deprecated as of Fusion 5.2.0.