import requests
url = "https://api.lucidworks.com/customers/{CUSTOMER_ID}/ai/models"
payload = {
"name": "<string>",
"modelType": "<string>",
"region": "<string>",
"trainingData": {
"catalog": "<string>",
"signals": "<string>"
},
"config": {
"dataset_config": "mlp_ecommerce",
"trainer_config": "mlp_ecommerce",
"trainer_config/text_processor_config": "word_en",
"trainer_config.encoder_config.rnn_names_list": ["gru"],
"trainer_config.encoder_config.rnn_units_list": [128],
"trainer_config.trn_batch_size": 123,
"trainer_config.num_epochs": 1,
"trainer_config.monitor_patience": 8,
"trainer_config.encoder_config.emb_spdp": 0.3,
"trainer_config.encoder_config.emb_trainable": True
},
"trainingDataCredentials": { "serviceAccountKey": "<string>" }
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print(response.text){
"id": "<string>",
"name": "<string>",
"modelType": "<string>",
"category": "<string>",
"description": "<string>",
"region": "<string>",
"trainingData": {
"catalog": "<string>",
"signals": "<string>"
},
"config": {
"dataset_config": "mlp_ecommerce",
"trainer_config": "mlp_ecommerce",
"trainer_config/text_processor_config": "word_en",
"trainer_config.encoder_config.rnn_names_list": [
"gru"
],
"trainer_config.encoder_config.rnn_units_list": [
128
],
"trainer_config.trn_batch_size": 123,
"trainer_config.num_epochs": 1,
"trainer_config.monitor_patience": 8,
"trainer_config.encoder_config.emb_spdp": 0.3,
"trainer_config.encoder_config.emb_trainable": true
},
"state": "<string>",
"trainingStarted": "<string>",
"trainingCompleted": "<string>",
"createdBy": "<string>"
}Creates a custom model and starts a training job. The custom model cannot be modified after it is created.
import requests
url = "https://api.lucidworks.com/customers/{CUSTOMER_ID}/ai/models"
payload = {
"name": "<string>",
"modelType": "<string>",
"region": "<string>",
"trainingData": {
"catalog": "<string>",
"signals": "<string>"
},
"config": {
"dataset_config": "mlp_ecommerce",
"trainer_config": "mlp_ecommerce",
"trainer_config/text_processor_config": "word_en",
"trainer_config.encoder_config.rnn_names_list": ["gru"],
"trainer_config.encoder_config.rnn_units_list": [128],
"trainer_config.trn_batch_size": 123,
"trainer_config.num_epochs": 1,
"trainer_config.monitor_patience": 8,
"trainer_config.encoder_config.emb_spdp": 0.3,
"trainer_config.encoder_config.emb_trainable": True
},
"trainingDataCredentials": { "serviceAccountKey": "<string>" }
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print(response.text){
"id": "<string>",
"name": "<string>",
"modelType": "<string>",
"category": "<string>",
"description": "<string>",
"region": "<string>",
"trainingData": {
"catalog": "<string>",
"signals": "<string>"
},
"config": {
"dataset_config": "mlp_ecommerce",
"trainer_config": "mlp_ecommerce",
"trainer_config/text_processor_config": "word_en",
"trainer_config.encoder_config.rnn_names_list": [
"gru"
],
"trainer_config.encoder_config.rnn_units_list": [
128
],
"trainer_config.trn_batch_size": 123,
"trainer_config.num_epochs": 1,
"trainer_config.monitor_patience": 8,
"trainer_config.encoder_config.emb_spdp": 0.3,
"trainer_config.encoder_config.emb_trainable": true
},
"state": "<string>",
"trainingStarted": "<string>",
"trainingCompleted": "<string>",
"createdBy": "<string>"
}Unique identifier derived from confidential client information.
The user-friendly name of the model.
The name of the custom model.
The geographic region specified when the custom model is being trained.
The location of the training data in Google Cloud Storage (GCS).
Show child attributes
The location of the catalog of the training data in Google Cloud Storage (GCS).
The catalog file contains documents (products) that will be searched. The file must have a pkid (product key ID) column which contains the document ID or product ID. The pkid is a unique value for each document, so entries with a duplicate pkid are filtered out. However, since not every pkid entry is associated with a query, there may be entries in the catalog index file that are not associated with a signals query entry.
The index file content format is different based on the model type to be trained. For example, a general model or an eCommerce model.
pkid - The unique product key ID. Required field. This must match an entry in the signals query file.text - A freeform text field.pkid - The unique product key ID. Required field. This must match an entry in the signals query file.name - The freeform text field that contains the product name.The location of signals in the training data in Google Cloud Storage (GCS).
The signals file must have a pkid (product key ID) column which refers to the relevant document or product ID. The file may contain multiple duplicates of any pkid because each document could be associated with several relevant queries.
NOTE: For evaluation purposes, 10% of unique queries (50 minimum and 5000 maximum) are automatically sampled into a validation set from the training query file.
pkid - The unique product key ID. Required field. This must match an entry in the catalog index file.query - A freeform text field.pkid - The unique product key ID. Required field. This must match an entry in the catalog index file.query - A freeform text field.aggr_count - The number of documents that match the query criteria. In most cases, this value is used as a weight and must be greater than zero (0). If you do not use weights or there is no value, set this value to 1. The weight is used for training pairs sampling and to compute normalized discounted cumulative gain (NDCG) metrics. If all values are 1.0, binary NDCG is computed.The configuration parameters passed to the model training job.
Show child attributes
The options for the dataset format used for training are:
mlp_general - This is used for the general recurrent neural networks (RNN) model type.mlp_ecommerce - This used for an eCommerce RNN model type."mlp_ecommerce"
The options for the trainer type used for training are:
mlp_general - This is used for the general recurrent neural networks (RNN) model type.mlp_ecommerce - This used for an eCommerce RNN model type."mlp_ecommerce"
This determines which type of tokenization and embedding is used as the base for the recurrent neural network (RNN) model. For example, word or bype-pair encoding (BPE). The word text processor defaults to English, and uses word-based tokenization and English pre-trained word embeddings. The maximum word vocabulary result is 100000. The BPE versions use the same tokenization, but different vocabulary sizes:
The options for text processors are:
word_en (default)bpe_en_smallbpe_en_largeall_minilm_l6e5_small_v2e5_base_v2e5_large_v2gte_smallgte_basegte_largesnowflake_arctic_embed_xsbpe_multimultilingual_e5_smallmultilingual_e5_basemultilingual_e5_largebpe_bg_smallbpe_bg_largebpe_de_smallbpe_de_largebpe_es_smallbpe_es_largebpe_fr_smallbpe_fr_largebpe_it_smallbpe_it_largebpe_ja_smallbpe_ja_largebpe_ko_smallbpe_ko_largebpe_nl_smallbpe_nl_largebpe_ro_smallbpe_ro_largebpe_zh_smallbpe_zh_largeword_custombpe_custom"word_en"
This determines which bi-directional recurrent neural network (RNN) layers are used. Options include gru and lstm.
The number of units for each recurrent neural network (RNN) layer.
IMPORTANT: You must specify the same number of units for trainer_config.encoder_config.rnn_units_list and its similarly-named trainer_config.encoder_config.rnn_names_list RNN layer. For example, rnn_units_list needs to be the same size as rnn_names_list.
Because this is a bi-directional RNN, the encoder's vector size is two times larger than the number of units in the last layer. For example, if one layer is 128 units, the output vector size is 256.
The batch size to be used for a single model training update. By default, an appropriate batch size is automatically determined based on the dataset size. If the field is set to null, the batch size is also automatically determined based on the dataset size.
The number of epochs the training data must complete. An epoch is a full cycle where training data passes through the designated algorithms. During one epoch, the model processes all the training data examples (queries and index documents) at least one time.
1 <= x <= 641
The number of epochs the training passes before it stops if there is no validation metric improvement during the epochs. The best model state based on the monitor validation metric is used as the final model.
For the general RNN, the mrr@3 metric is monitored and the monitor_patience default value is 8.
For the eCommerce RNN, the ndcg@5 metric is monitored and the monitor_patience default value is 16.
8
This field provides a regularization effect, which is the process to simplify result answers. The regularization is applied between the token embeddings layer and the first recurrent neural network (RNN) layer.
0.3
This field determines if fine-tuning of the token embeddings, such as word or byte pair encoding (BPE) token vectors, is enabled. If set, it can improve the quality of the model if the query contains less natural language, and training is negatively affected. Because the embeddings layer is the largest layer in the network, the process to improve the model requires enough training data to prevent overfitting.
OK
The identifier of the model. The value is the universally unique identified (UUID) that is the primary key for the model.
The user-friendly name of the model.
The name of the model.
The object that specifies the model is custom.
The description of the model.
The geographic region specified when the custom model was trained.
The location of the training data in Google Cloud Storage (GCS).
Show child attributes
The location of the catalog of the training data in Google Cloud Storage (GCS).
The catalog file contains documents (products) that will be searched. The file must have a pkid (product key ID) column which contains the document ID or product ID. The pkid is a unique value for each document, so entries with a duplicate pkid are filtered out. However, since not every pkid entry is associated with a query, there may be entries in the catalog index file that are not associated with a signals query entry.
The index file content format is different based on the model type to be trained. For example, a general model or an eCommerce model.
pkid - The unique product key ID. Required field. This must match an entry in the signals query file.text - A freeform text field.pkid - The unique product key ID. Required field. This must match an entry in the signals query file.name - The freeform text field that contains the product name.The location of signals in the training data in Google Cloud Storage (GCS).
The signals file must have a pkid (product key ID) column which refers to the relevant document or product ID. The file may contain multiple duplicates of any pkid because each document could be associated with several relevant queries.
NOTE: For evaluation purposes, 10% of unique queries (50 minimum and 5000 maximum) are automatically sampled into a validation set from the training query file.
pkid - The unique product key ID. Required field. This must match an entry in the catalog index file.query - A freeform text field.pkid - The unique product key ID. Required field. This must match an entry in the catalog index file.query - A freeform text field.aggr_count - The number of documents that match the query criteria. In most cases, this value is used as a weight and must be greater than zero (0). If you do not use weights or there is no value, set this value to 1. The weight is used for training pairs sampling and to compute normalized discounted cumulative gain (NDCG) metrics. If all values are 1.0, binary NDCG is computed.The configuration parameters passed to the model training job.
Show child attributes
The options for the dataset format used for training are:
mlp_general - This is used for the general recurrent neural networks (RNN) model type.mlp_ecommerce - This used for an eCommerce RNN model type."mlp_ecommerce"
The options for the trainer type used for training are:
mlp_general - This is used for the general recurrent neural networks (RNN) model type.mlp_ecommerce - This used for an eCommerce RNN model type."mlp_ecommerce"
This determines which type of tokenization and embedding is used as the base for the recurrent neural network (RNN) model. For example, word or bype-pair encoding (BPE). The word text processor defaults to English, and uses word-based tokenization and English pre-trained word embeddings. The maximum word vocabulary result is 100000. The BPE versions use the same tokenization, but different vocabulary sizes:
The options for text processors are:
word_en (default)bpe_en_smallbpe_en_largeall_minilm_l6e5_small_v2e5_base_v2e5_large_v2gte_smallgte_basegte_largesnowflake_arctic_embed_xsbpe_multimultilingual_e5_smallmultilingual_e5_basemultilingual_e5_largebpe_bg_smallbpe_bg_largebpe_de_smallbpe_de_largebpe_es_smallbpe_es_largebpe_fr_smallbpe_fr_largebpe_it_smallbpe_it_largebpe_ja_smallbpe_ja_largebpe_ko_smallbpe_ko_largebpe_nl_smallbpe_nl_largebpe_ro_smallbpe_ro_largebpe_zh_smallbpe_zh_largeword_custombpe_custom"word_en"
This determines which bi-directional recurrent neural network (RNN) layers are used. Options include gru and lstm.
The number of units for each recurrent neural network (RNN) layer.
IMPORTANT: You must specify the same number of units for trainer_config.encoder_config.rnn_units_list and its similarly-named trainer_config.encoder_config.rnn_names_list RNN layer. For example, rnn_units_list needs to be the same size as rnn_names_list.
Because this is a bi-directional RNN, the encoder's vector size is two times larger than the number of units in the last layer. For example, if one layer is 128 units, the output vector size is 256.
The batch size to be used for a single model training update. By default, an appropriate batch size is automatically determined based on the dataset size. If the field is set to null, the batch size is also automatically determined based on the dataset size.
The number of epochs the training data must complete. An epoch is a full cycle where training data passes through the designated algorithms. During one epoch, the model processes all the training data examples (queries and index documents) at least one time.
1 <= x <= 641
The number of epochs the training passes before it stops if there is no validation metric improvement during the epochs. The best model state based on the monitor validation metric is used as the final model.
For the general RNN, the mrr@3 metric is monitored and the monitor_patience default value is 8.
For the eCommerce RNN, the ndcg@5 metric is monitored and the monitor_patience default value is 16.
8
This field provides a regularization effect, which is the process to simplify result answers. The regularization is applied between the token embeddings layer and the first recurrent neural network (RNN) layer.
0.3
This field determines if fine-tuning of the token embeddings, such as word or byte pair encoding (BPE) token vectors, is enabled. If set, it can improve the quality of the model if the query contains less natural language, and training is negatively affected. Because the embeddings layer is the largest layer in the network, the process to improve the model requires enough training data to prevent overfitting.
This field specifies the current status of the custom model. The value is TRAINING.
The date and time the training started. This field only applies to custom models.
The date and time the training completed. This field only applies to custom models.
The user who created the model.
Was this page helpful?