Neural Hybrid Search
Neural Hybrid Search is a capability that combines lexical and semantic dense vector search to produce more accurate and relevant search results.
This feature is currently only available to clients who have contracted with Lucidworks for features related to Neural Hybrid Search and Lucidworks AI. |
This feature is only available in Managed Fusion 5.9.x for versions 5.9.6+. |
Overview
Lexical search works by looking for literal matches of keywords. For example, a query for chips
would result in potato chips and tortilla chips, but it could also result in chocolate chips. Semantic vector search, however, imports meaning. Semantic search could serve up results for potato chips, as well as other salty snacks like dried seaweed or cheddar crackers. Both methods have their advantages, and often you’ll want one or the other depending on your use case or search query. Neural Hybrid Search lets you use both: it combines the precision of lexical search with the nuance of semantic search.
Hybrid Scoring
The combination of lexical and semantic score is based on this function:
(vector_weight*vector_score + lexical_weight*scaled(lexical_score))
Because lexical scores can be arbitrarily large due to the use of TF-IDF and BM25, scaled()
means that the lexical scores are scaled close to 0
and 1
to be aligned with the bounded vector scores. This scaling of 1
is achieved by taking the largest lexical score and dividing all lexical scores by that high score.
-
For highly tuned lexical and semantic search, the ratio will be closer to
0.3
lexical weight and0.7
semantic weight. -
When using the Boost with Signals stage use
bq
, notboost
, and enable Scale Boosts to control how much the signals can impact the overall hybrid score. Lucidworks recommends keeping the scale boost values low, since SVS with scale scores with a max of1
.
In Fusion 5.9.5 - 5.9.9, all of the documents within the search collection must have an associated vector field. Otherwise, hybrid search fails on that vector field. This does not apply to Fusion 5.9.10 and later. |
For more information, see Semantic vector search test guidelines.
Solr Vector Query Types
Solr supports vector query types for semantic search that compare the similarity between encoded vector representations of content. These query types determine how results are retrieved and ranked based on proximity or similarity within the vector space.
The two vector query types used at Lucidworks are K-Nearest Neighbors (KNN) and Vector Similarity Threshold (VecSim).
The simplest difference between the two is how they return results:
-
KNN always returns a fixed number of results (topK), no matter the input. For example, if topK = 10, you’ll always get 10 results.
-
VecSim returns a varying number of results based on similarity score (from 0 to 1). Only items above a set threshold are returned, so it’s possible to get zero results if nothing is similar enough.
Read below to learn more about their details.
K-Nearest Neighbors (KNN)
This is a query where a top value (k) is always returned, referred to as topK. Regardless of the input vector there will always be k vectors returned because within the vector space of your encoded vectors there is always something in proximity.
Sharding with topK pulls k from each shard, so the final top k on a sharded collection will be topK*Shard_count
.
Using prefiltering makes it possible for top level filters to filter out results and still allow for results that were collected by the KNN query. Otherwise, when prefiltering is blocked it is possible to have 0 results after the KNN query after the filters are applied, to mitigate that risk a larger topK can be used at the cost of performance.
KNN Solr Scoring
Solr supports three different similarity score metrics: euclidean
, dot_product
or cosine
. In Managed Fusion, the default is cosine
. It’s important to note that Lucene bounds cosine to 0
to 1
, and therefore differs from standard cosine similarity. For more information, refer to the Lucene documentation on scoring formula and the Solr documentation on Dense Vector Search.
In Managed Fusion 5.9.5 - 5.9.9, Solr Collapse does not work well with Neural Hybrid Search because the computed hybrid score uses the vector score that is based on the head node and not the most relevant vector document within the collapse. This does not apply to Managed Fusion 5.9.10 and later.
|
Vector Cosine Similarity Cutoff/Threshold (VecSim)
This is a query where a cosine float value between 0 and 1 is given to compare similarity scores of the vectors to the input vector, everything above and at the threshold is kept, everything else is left out. It is possible to get zero results when using a similarity threshold because there may not be any documents that are within the given threshold.
This can be slower because the number of vectors is unknowable and it’s impossible to control the size of the vector result set. VecSim will speed up when prefiltering is enabled.
Replica choice
Lucidworks recommends using PULL and TLOG replicas. These replica types copy the index of the leader replica, which results in the same HNSW graph on every replica. When querying, the HNSW approximation query will be consistent given a static index.
In contrast, NRT replicas have their own index, so they will also have their own HNWS graph. HNSW is an Approximate Nearest Neighbor (ANN) algorithm, so it will not return exactly the same results for differently constructed graphs. This means that queries performed can and will return different results per HNWS graph (# of NRT replicas in a shard) which can lead to noticeable result shifts. When using NRT replicas, the shifts can be made less noticeable by increasing the topK
parameter. Variation will still occur, but should be lower in the documents. Another way to mitigate shifts is to use Neural Hybrid Search with a vector similarity cutoff.
For more information, refer to Solr Types of Replicas.
Considerations for multi-sharded collections
-
The Managed Fusion UI will show vectors floats encapsulated by
“ ”
. This is expected behavior. -
Sharding with
topK
pullsK
from each shardtopK*Shard_count
.