Jobs
These reference topics provide complete information about the configuration properties of jobs for which the subtype is "task" or "spark".
Jobs with the subtype "datasource" have configuration schemas that depend on the connector type; see Connectors Configuration Reference.
For conceptual information and instructions for configuring and scheduling jobs, see Jobs and Schedules.
Tasks
-
Delete old log messages from system logs collection.
-
A versatile job type that runs an arbitrary REST/HTTP/Solr command.
Spark jobs
-
Define an aggregation job.
-
The Custom Python job provides user the ability to run Python code via Fusion. This job supports Python 3.6+ code.
-
Run a custom Scala script as a Fusion job.
-
Unresolved directive in <stdin> - include::/fusion/reference/config-ref/jobs/als-recommender.asciidoc[tag=intro]
Use the BPR Recommender instead. |
-
Use this job when you already have clusters or well-defined document categories, and you want to discover and attach keywords to see representative words within those existing clusters. (If you want to create new clusters, use the Document Clustering job.)
-
Create Seldon Core Model Deployment Job
Use this job to deploy a Seldon Core Model into the Fusion cluster.
-
Delete Seldon Core Model Deployment Job
Use this job to remove a Seldon Core deployment from the Fusion cluster.
-
Cluster a set of documents and attach cluster labels.
-
Evaluate the performance of a Smart Answers pipeline.
-
Estimate ground truth queries using click signals and query signals, with document relevance per query determined using a click/skip formula.
-
Perform head/tail analysis of queries from collections of raw or aggregated signals, to identify underperforming queries and the reasons. This information is valuable for improving overall conversions, Solr configurations, auto-suggest, product catalogs, and SEO/SEM strategies, in order to improve conversion rates.
-
Logistic Regression Classifier Training
Train a regularized logistic regression model for text classification.
The Classification job, introduced in Fusion 5.2.0, provides more options and better logging. |
The Classification job, introduced in Fusion 5.2.0, provides more options and better logging. |
-
Use this job when you want to find outliers from a set of documents and attach labels for each outlier group.
-
The Parallel Bulk Loader (PBL) job enables bulk ingestion of structured and semi-structured data from big data systems, NoSQL databases, and common file formats like Parquet and Avro.
-
Identify multi-word phrases in signals.
-
Train a Smart Answers model on a supervised basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.
-
Train a Smart Answers model on a cold start (unsupervised) basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.
-
Query-to-Query Session-Based Similarity
Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute query-query similarities. This can be used for items-for-query recommendations as well as queries-for-query recommendations.
-
Random Forest Classifier Training
Train a random forest classifier for text classification.
The Classification job, introduced in Fusion 5.2.0, provides more options and better logging. |
-
Calculate relevance metrics (nDCG and so on) by replaying ground truth queries against catalog data using variants from an experiment.
-
A Spark SQL aggregation job where user-defined parameters are injected into a built-in SQL template at runtime.
-
Use this job to generate pairs of synonyms and pairs of similar queries. Two words are considered potential synonyms when they are used in a similar context in similar queries.
-
Token and Phrase Spell Correction
Detect misspellings in queries or documents using the numbers of occurrences of words and phrases.
-
Train a shallow neural model, and project each document onto this vector embedding space.
Word2Vec Model Training job is deprecated as of Fusion 5.2.0. |