Smart Answers Detailed Pipeline Setup

Typically, you can use the default pipelines included with Fusion AI. These pipelines now utilize Milvus to store encoded vectors and to calculate vector similarity. This topic provides information you can use to customize the Smart Answers pipelines. See also Configure The Smart Answers Pipelines.

"smart-answers" index pipeline

smart-answers default index pipeline

"smart-answers" query pipeline

smart-answers default query pipeline

Create the Milvus collection

Prior to indexing data, the Create Collections in Milvus job can be used to create the Milvus collection(s) used by the Smart Answers pipelines (see Milvus overview).

  • Job ID - A unique identifier for the job.

  • Collection Name - A name for the Milvus collection you are creating. This name is used in both the Smart Answer Index and the Smart Answer Query pipelines.

  • Dimension - The dimension size of the vectors to store in this Milvus collection. The Dimension should match the size of the vectors returned by the encryption model. For example, if the model was created with either the Smart Answers Coldstart Training job or the Smart Answers Supervised Training job with the Model Base word_en_300d_2M, then the dimension would be 300.

  • Index file size - Files with more documents than this will cause Milvus to build an index on this collection.

  • Metric - The type of metric used to calculate vector similarity scores. Inner Product is recommended. It produces values between 0 and 1, where a higher value means higher similarity.

Index pipeline setup

Stages in the default "smart-answers" index pipeline

smart-answers default index pipeline

Only one custom index stage needs to be configured in your index pipeline, the Encode into Milvus index stage.

The Encode into Milvus Index Stage

If you are using a dynamic schema, make sure this stage is added after the Solr Dynamic Field Name Mapping stage.

The Encode into Milvus index stage uses the specified model to encode the Field to Encode and store it in Milvus in the given Milvus collection. There are several required parameters:

  • Model ID - The ID of the model.

  • Encoder Output Vector - The name of the field that stores the compressed dense vectors output from the model. Default value: vector.

  • Field to Encode - The text field to encode into a dense vector, such as answer_t or body_t.

  • Milvus Collection Name - The name of the collection you created via the Create Milvus Collection job, which will store the dense vectors. When creating the collection you specify the type of Metric to use to calculate vector similarity. This stage can be used multiple times to encode additional fields, each into a different Milvus collection. See how to index and retrieve the question and answer together.

Query pipeline setup

The Query Fields stage

The first stage is Query Fields. For more information see the Query Fields stage.

The Milvus Query stage

The Milvus Query stage encodes the query into a vector using the specified model. It then performs a vector similarity search against the specified Milvus collection and returns a list of the best document matches.

  • Model ID - The ID of the model used when configuring the model training job.

  • Encoder Output Vector - The name of the output vector from the specified model, which will contain the query encoded as a vector. Defaults to vector.

  • Milvus Collection Name - The name of the collection that you used in the Encode into Milvus index stage to store the encoded vectors.

  • Milvus Results Context Key - The name of the variable used to store the vector distances. It can be changed as needed. It will be used in the Milvus Ensemble Query Stage to calculate the query score for the document.

  • Number of Results - The number of highest scoring results returned from Milvus. This stage would typically be used the same number of times that the Encode into Milvus index stage is used, each with a different Milvus collection and a different Milvus Results Context Key.

The Milvus Ensemble Query stage

The Milvus Ensemble Query takes the results of the Milvus Query stage(s) and calculates the ensemble score, which is used to return the best matches.

  • Ensemble math expression - The mathematical expression used to calculate the ensemble score. It should reference the value(s) variable name specified in the Milvus Results Context Key parameter in the Milvus Query stage.

  • Result field name - The name of the field used to store the ensemble score. It defaults to ensemble_score.

The Milvus Response Update Query stage

The Milvus Response Update Query stage does not need to be configured and can be skipped if desired. It inserts the Milvus values, including the ensemble_score, into each of the returned documents, which is particularly useful when there is more than one Milvus Query Stage. This stage needs to come after the Solr Query stage.