FAQ Solution Part 2
question-answering
index pipeline so that it uses the answer model.question-answering
query pipeline so that it uses the question model.x_a_fusion_model_bundle.zip
- The answer model.x_q_fusion_model_bundle.zip
- The question model.x_a_fusion_model_bundle.zip
to generate dense vectors for the documents to be indexed.
x_q_fusion_model_bundle.zip
to generate dense vectors for incoming questions on the fly, then compare those with the indexed dense vectors for answers to find answers, or with the indexed dense vectors for historical questions to find similar questions.
Default index pipelines | Default query pipelines | question-answering | For encoding one field. | |
question-answering | Calculates vectors distances between an encoded query and one document vector field. Should be used together with question-answering index pipeline. | question-answering-dual-fields | For encoding two fields (question and answer pairs, for example). | |
question-answering-dual-fields | Calculates vectors distances between an encoded query and two document vector fields. After that, scores are ensembled. Should be used together with the question-answering-dual-fields index pipeline. | See Configure the index pipeline below. | See Configure the query pipeline below. |
x_a_fusion_model_bundle.zip
model that was uploaded to the blob store.
_a_
(answer) model allows you to encode longer text.*
) field and specify each individual field you want to return, as returning too many fields will affect runtime performance.compressed_document_vector_s
, document_clusters_ss
, and score
as these fields are necessary for later stages
x_q_fusion_model_bundle.zip
model that was uploaded to the blob store.
_q_
) (question) model is slightly more efficient for short natural-language questions.text_t
, and we add an additional field type_s
whose value is “question” or “answer” to separate the two types.In the TensorFlow Deep Encoding stage, we specify Document Feature Field as text_t
so that compressed_document_vector_s
is generated based on this field.At search time, we can apply a filter query on the type_s
field to return either a question or an answer.You can achieve a similar result by using the default question-answering
index and query pipelines.question-answering-dual-fields
index and query pipelines default setup.For example, in the picture below, we added two TensorFlow Deep Encoding stages and named them Answers Encoding and Questions Encoding respectively. In the Questions Encoding stage, we specify Document Feature Field to be question_t, and changed the default values for Vector Field, Clusters Field and Distances Field to question_vector_ds, question_clusters_ss and question_distances_ds respectively. In the Answers Encoding stage, we specify Document Feature Field to be answer_t, and changed the default values for Vector Field, Clusters Field and Distances Field to answer_vector_ds, answer_clusters_ss and answer_distances_ds respectively. (Detailed information of the above field setup please refer to the “Appendix C: Detailed Pipeline Setup” section.)Since we have two dense vectors generated in the index (compressed_question_vector_s and compressed_answer_vector_s), at query time, we need to compute query to question distance and query to answer distance. This can be setup as the picture shown below. We added two Vectors distance per Query/Document stages and named them QQ Distance and QA Distance respectively. In the QQ Distance stage, we changed the default values for Document Vector Field and Document Vectors Distance Field to compressed_question_vector_s and qq_distance respectively. In the QA Distance stage, we changed the default values for Document Vector Field and Document Vectors Distance Field to compressed_answer_vector_s and qa_distance respectively. (Detailed information of the above field setup please refer to the “Appendix C: Detailed Pipeline Setup” section.)Now we have two distances (query to question distance and query to answer distance), we can ensemble them together with Solr score to get a final ranking score. This is recommended especially when you have limited FAQ dataset and want to utilize both question and answer information. This ensemble can be done in the Compute mathematical expression stage as shown below.http://<external hostname>:5550/evaluation
.Another function of this job is to help choose weights for different ranking scores such as Solr score, query to question distance, query to answer distance in Compute mathematical expression stage. If interested in performing this weights selection, please choose **Whether perform weights selection parameter to true, list the set of score names in List of ranking scores for ensemble parameter. Since we can use different scaling methods for Solr score in the stage, please choose which Solr scale function you used in the stage in Solr scale function parameter. Target metric to use for weight selection parameter allows to specify metric that should be optimized during weights selection, for example recall@3
. Metric values at different positions for different weights combinations will be shown in the log, sorted descendingly based on metric specified above. NOTE: Weights selection can take a while to run for big evaluation datasets, thus if only interested in comparing pipelines, please turn this function off by specifying Whether perform weights selection parameter to false.There are a few additional advanced parameters that might be useful but not required to provide. Additional query parameters allows to provide extra query parameters like rowsFromSolrToRerank
in a dictionary format. Sampling proportion and Sampling seed provides a possibility to run evaluation job only on a sample of data.