- Create the Milvus collection
- Index pipeline setup
- Query pipeline setup
Typically, you can use the default pipelines included with Fusion AI. These pipelines now utilize Milvus to store encoded vectors and to calculate vector similarity. This topic provides information you can use to customize the Smart Answers pipelines. See also Configure The Smart Answers Pipelines.
"smart-answers" index pipeline
"smart-answers" query pipeline
Create the Milvus collection
Job ID- A unique identifier for the job.
Collection Name- A name for the Milvus collection you are creating. This name is used in both the Smart Answer Index and the Smart Answer Query pipelines.
Dimension- The dimension size of the vectors to store in this Milvus collection. The Dimension should match the size of the vectors returned by the encryption model. For example, if the model was created with either the
Smart Answers Coldstart Trainingjob or the
Smart Answers Supervised Trainingjob with the Model Base
word_en_300d_2M, then the dimension would be 300.
Index file size- Files with more documents than this will cause Milvus to build an index on this collection.
Metric- The type of metric used to calculate vector similarity scores.
Inner Productis recommended. It produces values between 0 and 1, where a higher value means higher similarity.
Index pipeline setup
Only one custom index stage needs to be configured in your index pipeline, the Encode into Milvus index stage.
The Encode into Milvus Index Stage
|If you are using a dynamic schema, make sure this stage is added after the Solr Dynamic Field Name Mapping stage.|
The Encode into Milvus index stage uses the specified model to encode the
Field to Encode and store it in Milvus in the given Milvus collection.
There are several required parameters:
Model ID- The ID of the model.
Encoder Output Vector- The name of the field that stores the compressed dense vectors output from the model. Default value:
Field to Encode- The text field to encode into a dense vector, such as
Milvus Collection Name- The name of the collection you created via the Create Milvus Collection job, which will store the dense vectors. When creating the collection you specify the type of Metric to use to calculate vector similarity. This stage can be used multiple times to encode additional fields, each into a different Milvus collection. See how to index and retrieve the question and answer together.
Query pipeline setup
The Query Fields stage
The first stage is Query Fields. For more information see the Query Fields stage.
The Milvus Query stage
The Milvus Query stage encodes the query into a vector using the specified model. It then performs a vector similarity search against the specified Milvus collection and returns a list of the best document matches.
Model ID- The ID of the model used when configuring the model training job.
Encoder Output Vector- The name of the output vector from the specified model, which will contain the query encoded as a vector. Defaults to vector.
Milvus Collection Name- The name of the collection that you used in the
Encode into Milvusindex stage to store the encoded vectors.
Milvus Results Context Key- The name of the variable used to store the vector distances. It can be changed as needed. It will be used in the Milvus Ensemble Query Stage to calculate the query score for the document.
Number of Results- The number of highest scoring results returned from Milvus. This stage would typically be used the same number of times that the
Encode into Milvusindex stage is used, each with a different Milvus collection and a different
Milvus Results Context Key.
The Milvus Ensemble Query stage
The Milvus Ensemble Query takes the results of the Milvus Query stage(s) and calculates the
ensemble score, which is used to return the best matches.
Ensemble math expression- The mathematical expression used to calculate the
ensemble score. It should reference the value(s) variable name specified in the
Milvus Results Context Keyparameter in the Milvus Query stage.
Result field name- The name of the field used to store the
ensemble score. It defaults to
The Milvus Response Update Query stage
The Milvus Response Update Query stage does not need to be configured and can be skipped if desired. It inserts the Milvus values, including the
ensemble_score, into each of the returned documents, which is particularly useful when there is more than one
Milvus Query Stage. This stage needs to come after the
Solr Query stage.