embedding use case tokens before being sent to any pre-trained embedding model or custom embedding model.
This API is used to help debug embedding model tokens to ensure the input to the pre-trained or custom embedding model is valid, and within the model’s processing limits.
Before the tokens are passed to the embedding model, it may be formatted, truncated, expanded, or modified in other ways to meet that model’s requirements so the API call is successful.
embedding use cases.
For detailed API specifications in Swagger/OpenAPI format, see Platform APIs.
Prerequisites
To use this API, you need:- The unique
APPLICATION_IDfor your Lucidworks AI application. For more information, see credentials to use APIs. - A bearer token generated with a scope value of
machinelearning.predict. For more information, see Authentication API. - The embedding model name in the
MODEL_IDfield for the request. The path is:/ai/tokenization/MODEL_ID. For more information about supported models, see Embedding use cases.
Common parameters and fields
Some parameters in the/ai/tokenization/MODEL_ID request are common to all of the Async Chunking API requests, such as the modelConfig parameter.
Also referred to as hyperparameters, these fields set certain controls on the response.
Refer to the API spec for more information.
Vector quantization
Quantization is implemented by converting float vectors into integer vectors, allowing for byte vector search using 8-bit integers. Float vectors, while very precise, are often a bit of a burden to compute and store, especially as they grow in dimensionality. One solution to this issue is to convert the vector floats into integers after inference, making byte vectors which are lower consumers of memory space and faster to compute with minimal loss in accuracy or quality. Byte vectors are available through all of the Lucidworks LWAI hosted embedding models, including custom trained models. Vector quantization methods are implemented through themodelConfig parameter, vectorQuantizationMethod. The methods are named min-max and max-scale.
- The
min-maxmethod creates tensors of embeddings and converts them to uint8 by normalizing them to the range [0, 255]. - The
max-scalemethod finds the maximum absolute value along each embedding, normalizes the embeddings by scaling them to a range of -127 to 127, and returns the quantized embeddings as an 8-bit integer tensor.
max-scale method has no loss at the ten-thousandths place during evaluation against non-quantized vectors.
However, other methods lose precision when evaluated against non-quantized vectors, with min-max losing the most precision.
The syntax example is:
Matryoshka vector dimension reduction
Vector dimension reduction is the process of making the default vector size of a model smaller. The purpose of this reduction is to lessen the burden of storing large vectors while still achieving the good quality of a larger model. The syntax example is:Examples
The following example is a POST tokenization request. Replace the values in theAPPLICATION_ID, MODEL_ID, and ACCESS_TOKEN fields with your information.