How To
    Learn More

      Machine Learning Jobs

      Fusion provides these job types to perform machine learning tasks.

      Signals analysis

      These jobs analyze a collection of signals in order to perform query rewriting, signals aggregation, or experiment analysis.

      • Ground Truth

        Estimate ground truth queries using click signals and query signals, with document relevance per query determined using a click/skip formula.

      Query rewriting

      These jobs produce data that can be used for query rewriting or to inform updates to the synonyms.txt file.

      • Head/Tail Analysis

        Perform head/tail analysis of queries from collections of raw or aggregated signals, to identify underperforming queries and the reasons. This information is valuable for improving overall conversions, Solr configurations, auto-suggest, product catalogs, and SEO/SEM strategies, in order to improve conversion rates.

      • Phrase Extraction

        Identify multi-word phrases in signals.

      • Synonym Detection Jobs

        Use this job to generate pairs of synonyms and pairs of similar queries. Two words are considered potential synonyms when they are used in a similar context in similar queries.

      • Token and Phrase Spell Correction

        Detect misspellings in queries or documents using the numbers of occurrences of words and phrases.

      Signals aggregation

      • SQL Aggregation

        A Spark SQL aggregation job where user-defined parameters are injected into a built-in SQL template at runtime.

      Experiment analysis

      • Ranking Metrics

        Calculate relevance metrics (nDCG and so on) by replaying ground truth queries against catalog data using variants from an experiment.

      Collaborative recommenders

      These jobs analyze signals and generate matrices used to provide collaborative recommendations.

      Use the Query-to-Query Session-Based Similarity Jobs for better performance and query coverage.

      Content-based recommenders

      Content-based recommenders create matrices of similar items based on their content.

      • Content-Based Recommender

        Use this job when you want to compute item similarities based on their content, such as product descriptions.

      Content analysis

      The Classification job, introduced in Fusion 5.2.0, provides more options and better logging.
      The Classification job, introduced in Fusion 5.2.0, provides more options and better logging.
      • Word2Vec Model Training (Deprecated)

        Train a shallow neural model, and project each document onto this vector embedding space.

      Word2Vec Model Training job is deprecated as of Fusion 5.2.0.

      Data ingest

      • Parallel Bulk Loader

        The Parallel Bulk Loader (PBL) job enables bulk ingestion of structured and semi-structured data from big data systems, NoSQL databases, and common file formats like Parquet and Avro.