The Lucidworks AI Models API is used to manage custom models.

Prerequisites

To use this API, you need:
  • The unique CUSTOMER_ID for your organization. For more information, see credentials to use APIs.
  • A bearer token generated with a scope value of machinelearning.model. For more information, see Authentication API.
  • Other result-specific fields such as MODEL_ID and DEPLOYMENT_ID for certain operations.

Training configuration

General and ecommerce recurrent neural network (RNN) models are supported. For detailed information about training parameters and configuration, click View API specification.

Training data format

The catalog and signals training data require a shared primary key id pkid in both the:
  • index file that contains documents or products that are searched
  • query file that contains query data associated with the index documents

Text processors

The supported text processors are:
  • Word that contains a default of pre-trained English word tokenization and embeddings. The general RNN model defaults to this processor. The ecommerce RNN model uses this processor, and fine tunes the embeddings during training. This processor sets text to lowercase and numbers are split into single digits. Processing attempts to match misspelled words and out-of-vocabulary (OOV) words. The result vocabulary is maximum 100,000 words.
    For a language other than English, use the applicable byte pair encoding (BPE) processor.
  • Byte pair encoding (BPE) uses pre-trained BPE tokenization and embeddings. Each available pre-trained BPE model has different versions. The versions use the same token vectors, but have different vocabulary sizes:
    • bpe_*_small embeddings have up to 10,000 vocabulary tokens
    • bpe_*_large embeddings have up to 100,000 vocabulary tokens
    • bpe_multi multilingual embeddings have up to 320,000 vocabulary tokens
  • Custom token embeddings, either word or BPE, that are based on the data provided during model training. This can be used if your content contains domain-specific vocabulary, or to train a model for a non-supported language. This embeddings training is language agnostic, but Lucidworks recommends using custom BPE training for non-Latin languages or in multilingual scenarios. To train custom token embeddings, set TextProcessor to one of the following:
    • word_custom which trains word embeddings with up to 100,000 vocabulary size.
    • bpe_custom which trains BPE embeddings with up to 10,000 vocabulary size. This text processor learns a custom tokenization function over your data, so the default vocabulary size of 10,000 is sufficient in most cases.

Models endpoint

The /models endpoint operations perform the following:
  • GET returns a list of pre-trained and custom models.
  • POST creates a custom model and starts a training job. The custom model cannot be modified after it is created.

GET /models example

The request requires your unique CUSTOMER_ID. For more information, see Credentials.
curl --request GET \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/models \
  --header 'Content-Type: application/json'
  --header 'Authorization: Bearer ACCESS_TOKEN'

POST /models example

The request requires your unique CUSTOMER_ID. For more information, see Credentials.
curl --request POST \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/models \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "ecommerce custom model name",
  "modelType": "ecommerce-rnn",
  "region": "us-iowa",
  "trainingData": {
    "catalog": "gs://ml-platform-model-parameters-us-iowa/customer/data/index.parquet",
    "signals": "gs://ml-platform-model-parameters-us-iowa/customer/data/query.parquet"
  },
 "config": {
    "dataset_config": "mlp_ecommerce",
    "trainer_config": "mlp_ecommerce",
    "trainer_config/text_processor_config": "word_en",
    "trainer_config.encoder_config.rnn_names_list": [
      "gru"
    ],
    "trainer_config.encoder_config.rnn_units_list": [
      128
    ],
    "trainer_config.trn_batch_size": 0,
    "trainer_config.num_epochs": 1,
    "trainer_config.monitor_patience": 8,
    "trainer_config.encoder_config.emb_spdp": 0.3,
    "trainer_config.encoder_config.emb_trainable": true
  },
  "trainingDataCredentials": {
    "serviceAccountKey": "string"
  }
}

Model ID endpoint

The /modelId endpoint operation performs the following:
  • GET returns information about a specific model.

GET /modelId example

The request requires your unique CUSTOMER_ID and the specific MODEL_ID to return. For more information about CUSTOMER_ID, see Credentials.
curl --request GET \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/models/MODEL_ID \
  --header 'Content-Type: application/json'

Deployments endpoint

The /deployments endpoint operations perform the following:
  • GET returns a list of custom model deployments. Pre-trained models are not returned in the response because they are deployed in all available regions.
  • POST deploys a custom model.
  • DELETE deletes a custom model deployment.

GET /deployments example

The request requires your unique CUSTOMER_ID. For more information, see Credentials.
curl --request GET \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/deployments \
  --header 'Content-Type: application/json'

POST /deployments example

The request requires your unique CUSTOMER_ID. For more information, see Credentials.
curl --request POST \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/deployments \
  --header 'Content-Type: application/json' \
  --data '{
  "modelId": "441eb3be-7de6-470a-8141-e416a15c7db1",
  "region": "us-southcarolina",
  "minReplicas": 2,
  "maxReplicas": 4,
  "config": {
    "parameter_1": "value_1",
    "parameter_2": "value_2"
  }
}

DELETE /deployments example

The request requires your unique CUSTOMER_ID and the specific DEPLOYMENT_ID for the model. For more information about CUSTOMER_ID, see Credentials.
curl --request DELETE \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/deployments/DEPLOYMENT_ID \
  --header 'Content-Type: application/json'

Model ID Deployments endpoint

The /modelId/deployments endpoint operation performs the following:
  • GET returns a list of custom model deployments.

GET /modelId/deployments example

The request requires your unique CUSTOMER_ID and the specific MODEL_ID to return. For more information about CUSTOMER_ID, see Credentials.
curl --request GET \
  --url https://api.lucidworks.com/customers/CUSTOMER_ID/ai/models/MODEL_ID/deployments \
  --header 'Content-Type: application/json'