Semantic vector search test guidelines

Table of Contents

Golden dataset
- Query collection
- Ground truth definition
Evaluate semantic vector searches
- Related information

To ensure the system retrieves relevant results, Lucidworks recommends you implement comprehensive testing of semantic vector searches.

This feature is currently only available to clients who have contracted with Lucidworks for features related to Neural Hybrid Search and Lucidworks AI.

This feature is only available in Managed Fusion 5.9.x for versions 5.9.6+.

Golden dataset

A key component is to set a "gold standard", which is also referred to as a golden dataset. This dataset is:

A carefully curated query collection that:
- Is used as a benchmark to determine vector search performance
- Represents frequent queries and variations to ensure optimal relevancy
- Contains ground truth that defines the most relevant results for each query
Most effective when it contains a significant number of diverse queries that are updated periodically to reflect the most pertinent information
Enhanced when automated testing frameworks are incorporated, which can result in more extensive coverage of search scenarios and provide continuous system performance monitoring

Query collection

The dataset needs to include a wide range of queries that reflect your organization’s real-world user interactions. The types of queries to include are:

Typical use cases that comprise frequent queries
A variety of queries that test system rules and functions such as misspellings, ambiguous terms, synonyms, and phrasing differences

Ground truth definition

The ground truth definition must specify known, valid results for each query and can be built using:

Knowledgeable users and other experts familiar with the data, who select the most relevant items
Previous data that exemplifies relevant results for each query

Evaluate semantic vector searches

Query results from vector searches using the golden dataset are compared to the ground truth data. The system calculates metrics that provide information to help you enhance the search and return more relevant results.

Typically, performance metrics can be categorized as follows:

Precision metrics focus on results that adhere most closely with the criteria specified in the query. The most relevant items are reported in the top results. For example, precision metrics display the top 3 results.
Recall metrics evaluate the overall results retrieved and how relevant they are to ground truth specifications.
Ranking analysis metrics report results in order of relevance, with the most relevant results ranked the highest.

For more information about metrics available to:

Self-hosted Fusion clients, see Fusion experiment metrics query relevance
Managed Fusion clients, see Managed Fusion experiment metrics query relevance

Lucidworks Platform clients can also view metrics for custom trained models.

While these metrics are about the associated model, and not the golden dataset, they provide a solid basis for understanding if your model is suitable for deployment with your dataset.

Semantic vector search test guidelines

Golden dataset

Query collection

Ground truth definition

Evaluate semantic vector searches

Related information