Best practices

Below is a collection of useful tips and tricks that can help improve results.

How to begin

Use cases for chatbots, virtual assistants, QnA, and FAQ search

In most cases, Lucidworks recommends you begin with coldstart models.

The coldstart model is only suitable for English. And the large coldstart model increases query time. So if a moderate query time is acceptable, use a coldstart model.
For multilingual support or fast query time, use a multilingual model.

Coldstart models

Set Up a Pre-Trained Cold Start Model for Smart Answers are:

Set Up a Pre-Trained Cold Start Model for Smart Answers

Lucidworks provides these pre-trained cold start models for Smart Answers:

qna-coldstart-large - this is a large model trained on variety of corpuses and tasks.
qna-coldstart-multilingual - covers 16 languages. List of supported languages: Arabic, Chinese-simplified, Chinese-traditional, English, French, German, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Spanish, Thai, Turkish, Russian.

When you use these models, you do not need to run the model training job. Instead, you run a job that deploys the model into Managed Fusion. The Create Seldon Core Model Deployment job deploys your model as a Docker image in Kubernetes, which you can scale up or down like other Managed Fusion services.These models are a good basis for a cold start solution if your data does not contain much domain-specific terminology. Otherwise, consider training a model using your existing content.

Dimension size of vectors for both models is 512. You might need this information when creating collections in Milvus.

Deploy a pre-trained cold-start model into Managed Fusion

The pre-trained cold-start models are deployed using a Managed Fusion job called Create Seldon Core Model Deployment. This job downloads the selected pre-trained model and installs it in Managed Fusion.

Navigate to Collections > Jobs.
Select Add > Create Seldon Core Model Deployment.
Enter a Job ID, such as deploy-qna-coldstart-multilingual or deploy-qna-coldstart-large.
Enter the Model Name, one of the following:
- qna-coldstart-multilingual
- qna-coldstart-large
In the Docker Repository field, enter lucidworks.
In the Image Name field, enter one of the following:
- qna-coldstart-multilingual:v1.1
- qna-coldstart-large:v1.1
Leave the Kubernetes Secret Name for Model Repo field empty.
In the Output Column Names for Model field, enter one of the following:
- qna-coldstart-multilingual:[vector]
- qna-coldstart-large:[vector, compressed_vector]
Click Save.
Click Run > Start to start the deployment job.

Next steps

Evaluate a Smart Answers Query Pipeline

Robust out-of-the-box models.
Easily configured and do not require training data.
Used to establish a solid baseline solution to compare when running supervised training.

Use cases for never null and eCommerce

For these use cases, the preferred method is to build a training set and use a supervised job. It is usually possible to set up a training dataset. However, if that is not possible, set up an initial coldstart pre-trained model. The quality of model and results may vary because models are designed to use natural language questions and not short, less defined queries. To enhance the results:

Collect signal data (if it does not exist).
Build a training set.

Obtain training data

There are multiple ways to obtain training data. The supervised training job uses pairs of questions and answers (or queries and product names) to train deep learning encoders.

Use cases for chatbots, virtual assistants, QnA, and FAQ search

For these use cases:

Base training data typically consists of existing FAQ pair information.
After the coldstart solution is deployed, training data can also be derived from sources such as search logs and customer support correspondence. The data can be constructed and expanded manually.
Expanded training data can be obtained using the augmentation techniques in the Data augmentation job. For natural language scenarios, backtranslation is effective to provide paraphrased versions of the original questions and answers that improve vocabulary and semantic coverage.

Use cases for never null and eCommerce

Training data for these use cases is primarily collected signals data. There are different ways to construct training data, which determines how the resulting trained model behaves based on the:

Available types of signals.
Quantity of signals.
Needs of your business.

Examples include:

Query and product pair click signals:
- Help increase the click-through rate (CTR).
- Are usually high volume, but low quality.
Add-to-cart and purchase complete signals:
- Help increase sales.
- Have a very strong relationship between query and products.
- Are high quality, but low volume.
- Produce a resulting trained model that ranks products most likely to be purchased higher than it does other products (in the results generated by the query).
Null-search signals result from products selected after a zero-result search or an abandoned search.
Different training datasets can be combined to train one model to rule them all or results from the different models can be combined at query time.

How to evaluate results

The supervised training job incorporates the evaluation mechanism. A hold out validation set:

Is constructed automatically from unique questions and queries.
Can be sized in the job configuration.
Provides the ability to:
- Evaluate how well models generalize to new, unseen queries.
- Prevent the occurrence of overfitting.

You can also Evaluate a Smart Answers Query Pipeline to:

Evaluate a Smart Answers Query Pipeline

The Smart Answers Evaluate Pipeline job evaluates the rankings of results from any Smart Answers pipeline and finds the best set of weights in the ensemble score. This topic explains how to set up the job.Before beginning this procedure, prepare a machine learning model using either the Supervised method or the Cold start method, or by selecting one of the pre-trained cold start models, then Configure your pipelines.The input for this job is a set of test queries and the text or ID of the correct responses. At least 100 entries are needed to obtain useful results. The job compares the test data with Managed Fusion’s actual results and computes variety of the ranking metrics to provide insights of how well the pipeline works. It is also useful to use to compare with other setups or pipelines.

Prepare test data

Format your test data as query/response pairs, that is, a query and its corresponding answer in each row. You can do this in any format that Managed Fusion supports, but parquet file would be preferable to reduce the amount of possible encoding issues. The response value can be either the document ID of the correct answer in your Managed Fusion index (preferable), or the text of the correct answer.
If you use answer text instead of an ID, make sure that the answer text in the evaluation file is formatted identically to the answer text in Managed Fusion.
If there are multiple possible answers for a unique question, then repeat the questions and put the pair into different rows to make sure each row has exactly one query and one response.
If you wish to index test data into Managed Fusion, create a collection for your test data, such as sa_test_input and index the test data into that collection.

Configure the evaluation job

If you wish to save the job output in Managed Fusion, create a collection for your evaluation data such as sa_test_output.
Navigate to Collections > Jobs.
Select New > Smart Answers Evaluate Pipeline.
Enter a Job ID, such as sa-pipeline-evaluator.
Enter the name of your test data collection (such as sa_test_input) in the Input Evaluation Collection field.
Enter the name of your output collection (such as sa_test_output) in the Output Evaluation Collection field.
Enter the name of the Test Question Field in the input collection.
Enter the name of the answer field as the Ground Truth Field.
Enter the App Name of the Managed Fusion app where the main Smart Answers content is indexed.
In the Main Collection field, enter the name of the Managed Fusion collection that contains your Smart Answers content.
In the Fusion Query Pipeline field, enter the name of the Smart Answers query pipeline you want to evaluate.
In the Answer Or ID Field In Fusion field, enter the name of the field that Managed Fusion will return containing the answer text or answer ID.
Optionally, you can configure the Return Fields to pass from Smart Answers collection into the evaluation output.

Check the Query Workbench to see which fields are available to be returned.

Configure the Metrics parameters:

Solr Scale Function Specify the function used in the Compute Mathematical Expression stage of the query pipeline, one of the following:
- max
- log10
- pow0.5
List of Ranking Scores For Ensemble To find the best weights for different ranking scores, list the names of the ranking score fields, separated by commas. Different ranking scores might include Solr score, query-to-question distance, or query-to-answer distance from the Compute Mathematical Expression pipeline stage.
Target Metric To Use For Weight Selection The target ranking metric to optimize during weights selection. The default is mrr@3.

Optionally, read about the advanced parameters and consider whether to configure them as well.

For example, Sampling proportion and Sampling seed provide a way to run the job only on a sample of the test data. 16. Click Save.

The configured Smart Answers Evaluate Pipeline job

17. Click Run > Start.

Examine the output

The job provides a variety of metrics (controlled by the Metrics list advanced parameter) at different positions (controlled by the Metrics@k list advanced parameter) for the chosen final ranking score (specified in Ranking score parameter).Example: Pipeline evaluation metrics

Example: recall@1,3,5 for different weights and distances

In addition to metrics, a results evaluation file is indexed to the specified output evaluation collection. It provides the correct answer position for each test question as well as the top returned results for each field specified in Return fields parameter.

Ensure configuration is set correctly.
Compare different models or pipelines.
Perform an end-to-end evaluation of the entire query pipeline.

How to choose a training model

Select the model that best fits your training needs.

RNN models

RNN models support most use cases, including digital commerce and heavy query traffic domains. The models also provide:

Easily-understood training.
Extremely fast query time.

While several model cases are available, Lucidworks recommends that in most cases, begin with word_en_300d_2M. If the case requires other language support, use bpe_multilingual or one of the specific bpe models for CJK languages.

BERT models

BERT models:

Are only recommended if you have GPU access. Transformer-based models can only use a single GPU. Even if multiple GPUs are available, this type of model does not scale to multiple GPUs.
Are suitable for natural language queries, chatbots, virtual assistants, and low query traffic domains.
Provide better semantic quality results, especially if only a small amount of training data is available.
Are more expensive to use for training.
Run more slowly at query time than RNN models.

Model vector size

Pre-trained models produce vectors with a dimension of 512. Vector dimensions for supervised job trained models are highlighted at the end of the training logs, or can be derived from the model and parameters:

BERT-based model vectors have a dimension of 768.
RNN-based model vectors have a dimension of two times the size of the last specified layer.

Hyperparameters to tune the training job

Most of the parameters in the supervised training job contain effective default values. Some of the parameters are automatically determined based on the dataset statistics. Specific parameters to review are listed below, but see Advanced Model Training Configuration for Smart Answers for detailed information about tuning the training job.

General encoder parameters

These parameters are common to BERT and RNN models. The most significant parameter in this section is Fine-tune Token Embeddings. The parameter is disabled by default to prevent overfitting. However, enable the parameter to improve queries with multiple variations or jargon. For example, queries in the eCommerce domain.

RNN encoder parameters

Dropout Ratio: Tune in the [0.1, 0.5] range to improve regularization and prevent overfitting. The customary value is 0.15 or 0.3. RNN Function (Units) List: The parameters determine the number of layers to use and the output vector size. One layer is typically sufficient, and Lucidworks does not recommend using over two layers. Adding a smaller second layer produces a smaller vector size that reduces the vector index size and might reduce query time. For example, the current configuration is a one-layer network with 128 units (256 vector size) that provides good results. To improve QPS and index a very large collection without losing quality, add a small layer with 64 units (128 vector size). Instead of adding one small layer, you can add two small layers. The two additional small layers slightly increases encoding time, but the vector size is two times smaller in memory, which may improve search time.

Training and evaluation parameters

To reduce training time, especially when training on a CPU, lower the value in following parameters:

Number of Epochs
Monitor Patience

To increase training time, increase the value in the parameters.

The Training Batch Size parameter impacts the timeframe of an epoch and how often validation is performed. Lucidworks recommends setting the following:

At least 100 batches per training epoch
Use bigger batches with larger training datasets
Values of:
- [32, 64] for small training sets
- [128, 256] for medium training sets
- [512, 1024, 2048] for large training sets

Due to the size of BERT-based models, the training batch size values recommended are [8, 16, 32].

How to choose a Milvus index

Milvus supports multiple indexes. See the following Milvus documentation:

Vector index for detailed descriptions of the indexes.
Performance FAQ for performance information.

Default index

The default (FLAT) index provides:

The best possible quality (typically 100% of the model quality), but is the least effective in terms of query time.
A reasonable query time for low traffic use cases under 250 QPS.

HNSW index

The HNSW index:

Supports higher QPS.
Supports collections over a few million documents.
Needs to be configured with the parameter values detailed in Create Indexes in Milvus Jobs.

UI tour

Index data

Query data

Metrics and analytics

Improve your queries

Administration

Developer documentation

Machine learning

Neural Hybrid Search

Release notes

FAQs

How to begin

Use cases for chatbots, virtual assistants, QnA, and FAQ search

Coldstart models

Deploy a pre-trained cold-start model into Managed Fusion

Next steps

Use cases for never null and eCommerce

Obtain training data

Use cases for chatbots, virtual assistants, QnA, and FAQ search

Use cases for never null and eCommerce

How to evaluate results

Prepare test data

Configure the evaluation job

Examine the output

How to choose a training model

RNN models

BERT models

Model vector size

Hyperparameters to tune the training job

General encoder parameters

RNN encoder parameters

Training and evaluation parameters

How to choose a Milvus index

Default index

HNSW index

UI tour

Index data

Query data

Metrics and analytics

Improve your queries

Administration

Developer documentation

Machine learning

Neural Hybrid Search

Release notes

FAQs

​How to begin

​Use cases for chatbots, virtual assistants, QnA, and FAQ search

​Coldstart models

​Deploy a pre-trained cold-start model into Managed Fusion

​Next steps

​Use cases for never null and eCommerce

​Obtain training data

​Use cases for chatbots, virtual assistants, QnA, and FAQ search

​Use cases for never null and eCommerce

​How to evaluate results

​Prepare test data

​Configure the evaluation job

​Examine the output

​How to choose a training model

​RNN models

​BERT models

​Model vector size

​Hyperparameters to tune the training job

​General encoder parameters

​RNN encoder parameters

​Training and evaluation parameters

​How to choose a Milvus index

​Default index

​HNSW index

How to begin

Use cases for chatbots, virtual assistants, QnA, and FAQ search

Coldstart models

Deploy a pre-trained cold-start model into Managed Fusion

Next steps

Use cases for never null and eCommerce

Obtain training data

Use cases for chatbots, virtual assistants, QnA, and FAQ search

Use cases for never null and eCommerce

How to evaluate results

Prepare test data

Configure the evaluation job

Examine the output

How to choose a training model

RNN models

BERT models

Model vector size

Hyperparameters to tune the training job

General encoder parameters

RNN encoder parameters

Training and evaluation parameters

How to choose a Milvus index

Default index

HNSW index