chips
would result in potato chips and tortilla chips, but it could also result in chocolate chips.
Semantic vector search, however, imports meaning.
Semantic search could serve up results for potato chips, as well as other salty snacks like dried seaweed or cheddar crackers.
Both methods have their advantages, and often you’ll want one or the other depending on your use case or search query.
Neural Hybrid Search lets you use both: it combines the precision of lexical search with the nuance of semantic search.
To use semantic vector search in Fusion, you need to configure Neural Hybrid Search.
Then you can choose the balance between lexical and semantic vector search that works best for your use case.
For example, you can use a 70/30 split between semantic and lexical search, or a 50/50 split, or any other ratio that works for you.
This topic explains the concepts that you need to understand to configure and use Neural Hybrid Search in Fusion.
For instructions for enabling and configuring it in your pipeline, see Configure Neural Hybrid Search.
Configure Neural Hybrid Search
signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
Your Fusion account name must match the name of the account that you selected in the Account Name dropdown.
For more information about models, see:
{Destination Field}
is the vector field.{Destination Field}_b
is the boolean value if the vector has been indexed.useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
Your Fusion account name must match the name of the account that you selected in the Account Name dropdown.
For more information about models, see:
useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.
modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.
<copyField dest="\_text_" source="*"/>
and add <copyField dest="text" source="*_t"/>
below it. This will concatenate and index all *_t fields
.
_1024v
. There is no limitation on supported vector dimensions.<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.
ctx
), the preFilterKey
object becomes available.
preFilter
object adds both the top-level fq
and preFilter
to the parameters for the vector query.
You do not need to manually add the top level fq
in the javascript stage.
See the example below:
solrconfig.xml
within the <config>
tag:<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.knn
query parser as you would with Solr.
Specify the search vector and include it in the query.
For example, change the q
parameter to a knn
query parser string.You can also preview the results in the Query Workbench.
Try a few different queries, and adjust the weights and parameters in the Hybrid Query stage to find the best balance between lexical and semantic vector search for your use case.
You can also disable and re-enable the Neural Hybrid Query stage to compare results with and without it.XDenseVectorField
is not supported in Fusion 5.9.5 and above. Instead, use DenseVectorField
.topK
parameter. Variation will still occur, but it should be lower in the documents. Another way to mitigate shifts is to use Neural Hybrid Search with a vector similarity cutoff.For more information, refer to Solr Types of Replicas.In the case of Neural Hybrid Search, lexical BM25 & TF-IDF score differences that can occur with NRT replicas because of index differences for deleted documents, can also affect combined Hybrid score.
If you choose to use NRT replicas then it is possible that any lexical and/or semantic vectors variations can and will be exacerbated.--form-string 'fq=VECTOR_FIELD_b:true' \
ids
you see are the orphans.
Proceed to Resolving orphans.
If no documents are returned, there are likely no orphans.
You can try a few varying vectors to be certain.hnswBeamWidth
and hnswMaxConnections
per the Suggested values below.Orphaning rate | hnswBeamWidth | hnswMaxConnections |
---|---|---|
5% or less | 300 | 64 |
5% - 25% | 500 | 100 |
25% or more | 3200 | 512 |
scaled()
means that the lexical scores are scaled close to 0
and 1
to be aligned with the bounded vector scores. This scaling of 1
is achieved by taking the largest lexical score and dividing all lexical scores by that high score.
Hybrid scoring tips:
0.3
lexical weight and 0.7
semantic weight.bq
, not boost
, and enable Scale Boosts to control how much the signals can impact the overall hybrid score. Lucidworks recommends keeping the scale boost values low, since SVS with scale scores with a max of 1
.topK*Shard_count
.
Using prefiltering makes it possible for top level filters to filter out results and still allow for results that were collected by the KNN query. Otherwise, when prefiltering is blocked it is possible to have 0 results after the KNN query after the filters are applied, to mitigate that risk a larger topK can be used at the cost of performance.
euclidean
, dot_product
or cosine
. In Fusion, the default is cosine
. It’s important to note that Lucene bounds cosine to 0
to 1
, and therefore differs from standard cosine similarity. For more information, refer to the Lucene documentation on scoring formula and the Solr documentation on Dense Vector Search.
head
node and not the most relevant vector document within the collapse. This does not apply to Fusion 5.9.10 and later.topK
parameter. Variation will still occur, but it should be lower in the documents. Another way to mitigate shifts is to use Neural Hybrid Search with a vector similarity cutoff.
For more information, refer to Solr Types of Replicas.
“ ”
. This is expected behavior.topK
pulls K
from each shard topK*Shard_count
.Configure Neural Hybrid Search
signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
Your Fusion account name must match the name of the account that you selected in the Account Name dropdown.
For more information about models, see:
{Destination Field}
is the vector field.{Destination Field}_b
is the boolean value if the vector has been indexed.useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
Your Fusion account name must match the name of the account that you selected in the Account Name dropdown.
For more information about models, see:
useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.
modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.
<copyField dest="\_text_" source="*"/>
and add <copyField dest="text" source="*_t"/>
below it. This will concatenate and index all *_t fields
.
_1024v
. There is no limitation on supported vector dimensions.<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.
ctx
), the preFilterKey
object becomes available.
preFilter
object adds both the top-level fq
and preFilter
to the parameters for the vector query.
You do not need to manually add the top level fq
in the javascript stage.
See the example below:
solrconfig.xml
within the <config>
tag:<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.knn
query parser as you would with Solr.
Specify the search vector and include it in the query.
For example, change the q
parameter to a knn
query parser string.You can also preview the results in the Query Workbench.
Try a few different queries, and adjust the weights and parameters in the Hybrid Query stage to find the best balance between lexical and semantic vector search for your use case.
You can also disable and re-enable the Neural Hybrid Query stage to compare results with and without it.XDenseVectorField
is not supported in Fusion 5.9.5 and above. Instead, use DenseVectorField
.topK
parameter. Variation will still occur, but it should be lower in the documents. Another way to mitigate shifts is to use Neural Hybrid Search with a vector similarity cutoff.For more information, refer to Solr Types of Replicas.In the case of Neural Hybrid Search, lexical BM25 & TF-IDF score differences that can occur with NRT replicas because of index differences for deleted documents, can also affect combined Hybrid score.
If you choose to use NRT replicas then it is possible that any lexical and/or semantic vectors variations can and will be exacerbated.--form-string 'fq=VECTOR_FIELD_b:true' \
ids
you see are the orphans.
Proceed to Resolving orphans.
If no documents are returned, there are likely no orphans.
You can try a few varying vectors to be certain.hnswBeamWidth
and hnswMaxConnections
per the Suggested values below.Orphaning rate | hnswBeamWidth | hnswMaxConnections |
---|---|---|
5% or less | 300 | 64 |
5% - 25% | 500 | 100 |
25% or more | 3200 | 512 |
Configure Ray/Seldon vector search
text
.vector
.body_t
.body_512_v
.text
.vector
.vector
.knn
query parser as you would with Solr. Specify the search vector and include it in the query. For example, change the q
parameter to a knn
query parser string.The Ray/Seldon Vectorize Query stage will encode user queries using the specified model and modify the q
parameter to use the knn
query parser, turning the query into a vector search.Develop and deploy a machine learning model with Ray
pip install ray[serve]
.docker run
with a specified port, like 9000, which you can then curl
to confirm functionality in Fusion.
See the testing example below.kubectl edit configmap argo-deploy-ray-model-workflow -n <namespace>
and then find the ray-head
container in the artisanal escaped YAML and change the memory limit.
Exercise caution when editing because it can break the YAML.
Just delete and replace a single character at a time without changing any formatting.
MODEL_DEPLOYMENT
in the command below can be found with kubectl get svc -n NAMESPACE
. It will have the same name as set in the model name in the Create Ray Model Deployment job.
e5-small-v2
model from Hugging Face, but any pre-trained model from https://huggingface.co will work with this tutorial.If you want to use your own model instead, you can do so, but your model must have been trained and then saved though a function similar to the PyTorch’s torch.save(model, PATH)
function.
See Saving and Loading Models in the PyTorch documentation.e5-small-v2
model is as follows:./
.__call__
: This function is non-negotiable.init
: The init
function is where models, tokenizers, vectorizers, and the like should be set to self for invoking.
It is recommended that you include your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside init
.encode
: The encode
function is where the field or query that is passed to the model from Fusion is processed.
Alternatively, you can process it all in the __call__
function, but it is cleaner not to.
The encode
function can handle any text processing needed for the model to accept input invoked in its model.predict()
or equivalent function which gets the expected model result.deployment.py
and the class name is Deployment()
.requirements.txt
file is a list of installs for the Dockerfile
to run to ensure the Docker container has the right resources to run the model.
For the e5-small-v2
model, the requirements are as follows:import
statement in your Python file, it should be included in the requirements file.To populate the requirements, use the following command in the terminal, inside the directory that contains your code:MODEL_NAME.py
, Dockerfile
, and requirements.txt
files, you need to run a few Docker commands.
Run the following commands in order:Parameter | Description |
---|---|
Job ID | A string used by the Fusion API to reference the job after its creation. |
Model name | A name for the deployed model. This is used to generate the deployment name in Ray. It is also the name that you reference as a model-id when making predictions with the ML Service. |
Model min replicas | The minimum number of load-balanced replicas of the model to deploy. |
Model max replicas | The maximum number of load-balanced replicas of the model to deploy. Specify multiple replicas for a higher-volume intake. |
Model CPU limit | The number of CPUs to allocate to a single model replica. |
Model memory limit | The maximum amount of memory to allocate to a single model replica. |
Ray Deployment Import Path | The path to your top-level Ray Serve deployment (or the same path passed to serve run ). For example, deployment:app |
Docker Repository | The public or private repository where the Docker image is located. If you’re using Docker Hub, fill in the Docker Hub username here. |
Image name | The name of the image. For example, e5-small-v2-ray:0.1 . |
Kubernetes secret | If you’re using a private repository, supply the name of the Kubernetes secret used for access. |
Parameter | Description |
Additional parameters. | This section lets you enter parameter name:parameter value options to be injected into the training JSON map at runtime. The values are inserted as they are entered, so you must surround string values with " . This is the sparkConfig field in the configuration file. |
Write Options. | This section lets you enter parameter name:parameter value options to use when writing output to Solr or other sources. This is the writeOptions field in the configuration file. |
Read Options. | This section lets you enter parameter name:parameter value options to use when reading input from Solr or other sources. This is the readOptions field in the configuration file. |
Configure the LWAI Neural Hybrid Search pipeline
qf
: query_t
pf
: query_t^50
pf
: query_t-3^20
pf2
: query_t^20
pf2
: query_t~3^10
pf3
: query_t^10
pf3
: query_t~3^5
boost
: map(query({!field f=query_s v=$q}),0,0,1,20)
mm
: 50%
defType
: edismax
sort
: score desc, weight_d desc
fq
: weight_d:[ TO **]
\*
or a colon :
is used.signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
useCaseConfig
parameter that is common to generative AI and embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.
modelConfig
parameters are common to generative AI use cases. For more information, see Prediction API.
\*
or a colon :
.In addition, this stage does not function correctly if the incoming q parameter is a Solr query parser string. For example, field_t:foo rather than a raw user query string.The resulting query is always written to<request.params.q>
.<request.params.q>
. Template expressions are supported.<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.\*
or a colon :
.In addition, this stage does not function correctly if the incoming q parameter is a Solr query parser string. For example, field_t:foo rather than a raw user query string.The resulting query is always written to<request.params.q>
.<request.params.q>
. Template expressions are supported.<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.ctx
), the preFilterKey
object becomes available.preFilter
object adds both the top-level fq
and preFilter
to the parameters for the vector query.
You do not need to manually add the top level fq
in the javascript stage.||
as the delimiter to parse each facet label mapping in the blob. A Java regular expression is also a valid value. Regex must start with ^
and end with $
.Configure the LWAI Vectorize pipeline
phone
field is indexed into both the phone_s
single-valued field and the phone_ss
multi-valued field. If this option is not selected, the phone
field is indexed into only the phone_s
single-valued field.name
text field with a value of John Smith is indexed into both the name_t
and name_s
fields allowing relevant search using name_t
field (by matching to a Smith query) and also proper faceting and sorting using name_s
field (using John Smith for sorting or faceting). If this option is not selected, the name
text field is indexed into only the name_t
text field by default.{Destination Field}_b
is the boolean value if the vector has been indexed.
{Destination Field}
is the vector field.
useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.
_lw_fields_ss
multi-valued field to the document, which lists all fields that are being sent to Solr.
commit=true
and optimize=true
to be passed to Solr when specified as request parameters coming into this pipeline. Document commands that specify commit or optimize are still respected even if this checkbox is not selected.