Configure The Smart Answers Pipelines (5.1 and 5.2 only)

Before beginning this procedure, train a machine learning model using either the FAQ method or the cold start method.

Note
For instructions for Fusion 5.3 and up, see Configure The Smart Answers Pipelines (5.3 and Up).

Regardless of how you set up your model, the deployment procedure is the same:

The following default index and query pipelines for Smart Answers are automatically created when you create a Fusion app:

Default index pipelines Default query pipelines

question-answering

For encoding one field.

question-answering

Calculates vectors distances between an encoded query and one document vector field.

Should be used together with question-answering index pipeline.

question-answering-dual-fields

For encoding two fields (question and answer pairs, for example).

question-answering-dual-fields

Calculates vectors distances between an encoded query and two document vector fields. After that, scores are ensembled.

Should be used together with the question-answering-dual-fields index pipeline.

1. Configure the index pipeline

question-answering default index pipeline

  1. Open the Index Workbench.

  2. Load or create your datasource using the default question-answering index pipeline.

  3. In the Machine Learning stage, change the value of Model ID to match the model deployment name you chose when you configured the model training job.

  4. Change documentFeatureField to the document field name to be processed and encoded into dense vectors. documentFeatureField variable in the “Model input transformation script” to the document field name to be processed and encoded into dense vectors.

  5. In the Model input transformation script field, enter the script below, replacing the documentFeatureField variable value (body_t by default) with the document field name to be processed and encoded into dense vectors.

    /*
    Name of the document field to feed into the encoder.
    */
    var documentFeatureField = "body_t"
    
    /*
    Model input construction.
    */
    var modelInput = new java.util.HashMap()
    modelInput.put("text", doc.getFirstFieldValue(documentFeatureField))
    modelInput.put("pipeline", "index")
    modelInput.put("compress", "true")
    modelInput.put("unidecode", "true")
    modelInput.put("lowercase", "false")
    
    modelInput
  6. Save the datasource.

  7. Index your data.

2. Configure the query pipeline

question-answering default query pipeline

  1. Open the Query Workbench.

  2. Load one of the default question-answering query pipelines.

  3. In the Query Fields stage, update Return Fields to return additional fields that should be displayed with each answer, such as fields corresponding to title, text, or ID.

    It is recommended that you remove the asterisk (*) field and specify each individual field you want to return, as returning too many fields will affect runtime performance.

    Note
    Do not remove compressed_document_vector_s, document_clusters_ss, and score as these fields are necessary for later stages
  4. In the Machine Learning stage, change the Model ID value to match the model deployment name you chose when you configured the model training job.

  5. Save the query pipeline.

Pipeline Setup Examples

Example 1: Index and retrieve the question and answer separately

Based on your search Web page design, you can put best-matched questions and answers in separate sections, or if you only want to retrieve answers and serve to chatbot app, please index them separately in different documents.

For example, in the picture below, we construct the input file for the index pipeline such that the text part of the question/answer is stored in answer_t, and we add an additional field type_s whose value is "question" or "answer" to separate the two types.

Pipeline setup example #1

In the Machine Learning stage, we specify documentFeatureField as answer_t in the Model input transformation script so that compressed_document_vector_s is generated based on this field.

Pipeline setup example #1

At search time, we can apply a filter query on the type_s field to return either a question or an answer.

You can achieve a similar result by using the default question-answering index and query pipelines.

(For more detail, see Smart Answers Detailed Pipeline Setup.)

Example 2: Index and retrieve the question and answer together

If you prefer to show question and answer together in one document (that is, treat the question as the title and the answer as the description), you can index them together in the same document. It’s similar to the question-answering-dual-fields index and query pipelines default setup.

For example, in the picture below, we added two Machine Learning stages and named them Answers Encoding and Questions Encoding respectively.

Pipeline setup example #2

In the Questions Encoding stage, we specify documentFeatureField to be question_t, and change the default values for compressedVectorField, vectorField, clustersField, and distancesField to compressed_question_vector_s, question_vector_ds, question_clusters_is, and question_distances_ds respectively, in the Model output transformation script.

Pipeline setup example #2 - output

In the Answers Encoding stage, we specify documentFeatureField to be answer_t, and change the default values for compressedVectorField, vectorField, clustersField, and distancesField to answer_vector_ds, answer_clusters_ss and answer_distances_ds respectively.

(For more detail, see Smart Answers Detailed Pipeline Setup.)

The indexed document is shown in the picture below.

Since we have two dense vectors generated in the index (compressed_question_vector_s and compressed_answer_vector_s), at query time, we need to compute query to question distance and query to answer distance. This can be setup as the picture shown below. We added two Vectors distance per Query/Document stages and named them QQ Distance and QA Distance respectively. In the QQ Distance stage, we changed the default values for Document Vector Field and Document Vectors Distance Field to compressed_question_vector_s and qq_distance respectively. In the QA Distance stage, we changed the default values for Document Vector Field, Document Vectors, and Distance Field to compressed_answer_vector_s and qa_distance respectively.

Pipeline setup example #2 - QQ Distance stage

Now we have two distances (query-to-question distance and query-to-answer distance) and we can ensemble them together with Solr score to get a final ranking score. This is recommended especially when you have limited FAQ dataset and want to utilize both question and answer information. This ensemble can be done in the Compute mathematical expression stage as shown below.

Pipeline setup example #2 - Compute Mathematical Expression stage

Evaluate the query pipeline

The Smart Answers Evaluate Pipeline job (Evaluate QnA Pipeline job in Fusion 5.1 and 5.2) evaluates the rankings of results from any Smart Answers pipeline and finds the best set of weights in the ensemble score. See Evaluate a Smart Answers Pipeline for setup instructions.