Semantic vector search test guidelines
To ensure the system retrieves relevant results, Lucidworks recommends you implement comprehensive testing of semantic vector searches.
This feature is currently only available to clients who have contracted with Lucidworks for features related to Neural Hybrid Search and Lucidworks AI. |
This feature is only available in Fusion 5.9.5 and later versions of Fusion 5.9. |
Golden dataset
A key component is to set a "gold standard", which is also referred to as a golden dataset. This dataset is:
-
A carefully curated query collection that:
-
Is used as a benchmark to determine vector search performance
-
Represents frequent queries and variations to ensure optimal relevancy
-
Contains ground truth that defines the most relevant results for each query
-
-
Most effective when it contains a significant number of diverse queries that are updated periodically to reflect the most pertinent information
-
Enhanced when automated testing frameworks are incorporated, which can result in more extensive coverage of search scenarios and provide continuous system performance monitoring
Query collection
The dataset needs to include a wide range of queries that reflect your organization’s real-world user interactions. The types of queries to include are:
-
Typical use cases that comprise frequent queries
-
A variety of queries that test system rules and functions such as misspellings, ambiguous terms, synonyms, and phrasing differences
Ground truth definition
The ground truth definition must specify known, valid results for each query and can be built using:
-
Knowledgeable users and other experts familiar with the data, who select the most relevant items
-
Previous data that exemplifies relevant results for each query
Evaluate semantic vector searches
Query results from vector searches using the golden dataset are compared to the ground truth data. The system calculates metrics that provide information to help you enhance the search and return more relevant results.
Typically, performance metrics can be categorized as follows:
-
Precision metrics focus on results that adhere most closely with the criteria specified in the query. The most relevant items are reported in the top results. For example, precision metrics display the top 3 results.
-
Recall metrics evaluate the overall results retrieved and how relevant they are to ground truth specifications.
-
Ranking analysis metrics report results in order of relevance, with the most relevant results ranked the highest.
For more information about metrics available to:
-
Self-hosted Fusion clients, see Fusion experiment metrics query relevance
-
Managed Fusion clients, see Managed Fusion experiment metrics query relevance
Related information
Lucidworks Platform clients can also view metrics for custom trained models.
While these metrics are about the associated model, and not the golden dataset, they provide a solid basis for understanding if your model is suitable for deployment with your dataset.