Product Selector

Fusion 5.12
    Fusion 5.12

    Ground Truth Jobs

    Ground truth or gold standard datasets are used in the ground truth jobs and query relevance metrics to define a specific set of documents.

    Ground truth jobs estimate ground truth queries using click signals and query signals, with document relevance per query determined using a click/skip formula.

    Use this job along with the Ranking Metrics job to calculate relevance metrics, such as Normalized Discounted Cumulative Gain (nDCG).

    To create a ground truth job, sign in to Fusion and click Collections > Jobs. Then click Add+ and in the Experiment Evaluation Jobs section, select Ground Truth. You can enter basic and advanced parameters to configure the job. If the field has a default value, it is populated when you click to add the job.

    Basic parameters

    To enter advanced parameters in the UI, click Advanced. Those parameters are described in the advanced parameters section.
    • Spark job ID. The unique ID for the Spark job that references this job in the API. This is the id field in the configuration file. Required field.

    • Input/Output Parameters. This section includes the Signals collection field, which is the Solr collection that contains click signals and its associated search log identifier. This is the signalsCollection field in the configuration file. Required field.

    Advanced parameters

    If you click the Advanced toggle, the following optional fields are displayed in the UI.

    • Spark Settings. This section lets you enter parameter name:parameter value options to use in this job. This is the sparkConfig field in the configuration file.

    • Additional Options. This section includes the following options:

      • Search logs pipeline. The pipeline ID associated with search log entries. This is the searchLogsPipeline field in the configuration file.

      • Join key (query signals). The common key that joins the query signals in the signals collection. This is the joinKeySignals field in the configuration file.

      • Join key (click signals). The common key that joins the click signals in the signals collection. This is the joinKeySignals field in the configuration file.

      • Search logs and options. This section lets you enter property name:property value options to when loading the search logs collection. This is the searchLogsAddOpts field in the configuration file.

      • Additional signals options. This section lets you enter property name:property value options when loading the signals collection. This is the signalsAddOpts field in the configuration file.

      • Filter queries. The array[string] filter query to apply when selecting top queries from the query signals in the signals collection. This is the filterQueries field in the configuration file.

      • Top queries limit. The total number of queries to select for ground truth calculations when this job is run. This is the topQueriesLimit field in the configuration file.

    Use this job when you want to estimate ground truth queries using click and query signals with document relevance per query determined using a click/skip formula. Pair this job with ranking metrics job to calculate relevance metrics, such as nDCG

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.

    <= 63 characters

    Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

    sparkConfig - array[object]

    Spark configuration settings.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    signalsCollection - stringrequired

    Collection containing click signals and the associated search log identifier

    >= 1 characters

    searchLogsAddOpts - object

    Additional options to use while loading search logs collection

    signalsAddOpts - object

    Additional options to use while loading signals collection

    searchLogsPipeline - string

    Pipeline id associated with search log entries

    >= 1 characters

    joinKeySearchLogs - string

    Join key of query signals in the signals collection

    Default: id

    joinKeySignals - string

    Join key of click signals in the signals collection

    Default: fusion_query_id

    filterQueries - array[string]

    Filter queries to apply while choosing top queries from query signals in signals collection

    topQueriesLimit - integer

    Total number of queries to pick for Ground truth calculations

    Default: 100

    type - stringrequired

    Default: ground_truth

    Allowed values: ground_truth