Advanced model training configuration
- Input data pre-processing parameters
- Word2vec parameters
- Model tuning parameters (for advanced users not using auto-tune)
- Evaluation parameters
- Clustering parameters
This topic provides tips for customizing and fine-tuning your pipeline configurations.
Input data pre-processing parameters
The Maximum vocabulary size, Lower case all words, Min Doc Support, and Max Doc Support parameters impact the vocabulary size. Default values should work in most cases, given enough RAM and time to train.
If you see an out-of-memory error, try reducing the vocabulary size and/or the training batch size. The Minimum number of words and Maximum number of words parameters can help trim problematic documents.
As described in Training scenarios, use the Generate customized embeddings parameter to specify whether you want to train Word2vec using your own data.
If you want to use a different dataset to train Word2vec instead of FAQ input, specify the Content documents file name and Field which contains the content documents parameters.
Additionally, commonly-used Word2vec training parameters are Word2Vec Dimension, Word2Vec Window Size, and Word2Vec Training Iterations. Default values should work in most cases.
|Smaller Word2Vec dimensions shorten the search time during implementation. However, dimensions smaller than 100 may impact search quality.|
Model tuning parameters (for advanced users not using auto-tune)
We use RNN-based deep learning architecture to train the FAQ model, with the flexibility to choose between LSTM and GRU layers with more than one laye. We don’t recommend using more than three layers. The layers and layer sizes are controlled by the RNN function list and RNN function units list parameters.
If you leave the following tuning parameters blank in the UI, the program automatically calculates reasonable values based on data size: * Number of epochs * Training batch size * Inference batch size * Minimum learning rate * Maximum learning rate * RNN function list * RNN function units list * question length * answer length
For cold start, since there are no questions in the dataset, the program automatically calculates answer length, and uses 25 as the default max question length.
The program won’t automatically calculate Dropout rate (default 0.15) or Weight decay (default 0.0001).
The above parameters are not exposed in the UI for auto-tune mode. In auto-tune mode, our module will try different parameter combinations to find the best model.
You can set Number of epochs to a small number if you just want to test the whole workflow.
A list of evaluation metrics is provided to monitor the training process and measure the quality of the final model:
You can choose from the list in the Metrics list parameter. It uses all five metrics by default.
You can also specify measuring the ranking position for each metric. For example, if you specify Metrics@k list as
[1,3], with Metrics list
[“map”,”mrr”,”roc_auc”], then the metrics
roc_auc@3 will be printed in the log for each training epoch and final model.
You can choose a particular metric at a particular
k (controlled by the Monitoring metric parameter) to help decide when to stop training. Specifically, when there is no increase in the Monitoring metric value for x number of epochs (controlled by the Patience during monitoring parameter), then training stops.
|For FAQ training (with or without auto-tune), we automatically evaluate the result using just the Word2vec dense vectors without deep learning training as a baseline model. Look for the Cold-start encoder validation evaluation section of the log for the evaluation metrics.|
In order to reduce dense vector retrieval time during implementation, we perform clustering of dense vectors as the last step.
During training time, we find cluster centers based on training data. At query time, after questions are transferred to dense vectors, the model can find the best matching cluster centers and then find the closest answer or question dense vectors within the same cluster.
The number of clusters is automatically calculated based on input data. You can also change the Number of clusters parameter to the desired level of granularity. The model can find multiple best-matching cluster centers (controlled by the Rerank top k clusters parameter) and search inside those clusters for best matching vectors.