Cold Start Solution
When no training data exists to use the Supervised job, the cold start solution uses general pre-trained Deep Learning models or provides a possibility to train custom word embeddings for specific domains.
Over time, we suggest capturing signals from document clicks, likes, and downloads. These signals can be used to construct training dataset. After cumulating at least 3000 pairs, that feedback can be used as training data for the Supervised solution.
Like the Supervised solution, the Cold start solution has two parts:
1. Prepare a model
There are two ways to do this:
-
Smart Answers comes with two pre-trained cold-start models. These models can serve as a strong baseline if your data does not have too much domain-specific terms. See Set Up A Pre-Trained Cold Start Model.
-
You can train your own cold start model by configuring a job that analyzes your existing content. See Train A Smart Answers Cold Start Model.
2. Create collections in Milvus
In order to use {app_name}-smart-answers
pipelines, you need to create collections in Milvus. Please refer to the Milvus documentation page.
3. Configure the pipelines
The trained model should be used at both index and query time in order to perform dense vector search.
-
At index time, we provide the
{app_name}-smart-answers
index pipeline to help generate a dense vector representation of answers. -
At query time, we provide a
{app_name}-smart-answers
query pipeline to conduct run-time neural search. This pipeline transforms the incoming query into a dense vector using the trained model, then compares it with indexed answer dense vectors by computing the cosine distance between them. You can also use a query stage to combine Solr and document vector similarity scores at query time.
See Configure The Smart Answers Pipelines. Once your pipelines are configured, see Evaluate a Smart Answers Query Pipeline to test its effectiveness so that you can fine-tune the configuration.