Tokenization APILucidworks AI Prediction API
The Lucidworks AI Tokenization API returns Prediction API embedding
use case tokens before being sent to any pre-trained embedding model or custom embedding model.
This API is used to help debug embedding
model tokens to ensure the input to the pre-trained or custom embedding model is valid, and within the model’s processing limits.
Before the tokens are passed to the embedding model, it may be formatted, truncated, expanded, or modified in other ways to meet that model’s requirements so the API call is successful. |
These preprocessing steps are integral to deliver optimized tokens that generate coherent and relevant responses. By examining the tokenization after preprocessing, you can better understand how your input is being interpreted by the embedding models, which can help you refine your queries for more accurate and useful outputs.
The input parameter keys and values are the same used in the Prediction API embedding
use cases.
Prerequisites
To use this API, you need:
-
The unique
APPLICATION_ID
for your Lucidworks AI application. For more information, see credentials to use APIs. -
A bearer token generated with a scope value of
machinelearning.predict
. For more information, see Authentication API. -
The embedding model name in the
MODEL_ID
field for the request. The path is:/ai/tokenization/MODEL_ID
. For more information about supported models, see Embedding use cases.
Common parameters and fields
useCaseConfig
The parameter available in all of the embedding use cases is:
"useCaseConfig": "dataType": "string"
This optional parameter enables model-specific handling in the Prediction API to help improve model accuracy. Use the most applicable fields based on available dataTypes and the dataType value that best aligns with the text sent to the Prediction API.
The string values to use for embedding models are:
-
"dataType": "query"
for the query. For query-to-query pairing, best practice is to usedataType=query
on both API calls. -
"dataType": "passage"
for fields searched at query time.For example, if questions and answers from a FAQ are indexed, the value for questions is
"dataType": "query"
and the value for the answers is"dataType": "passage"
.
The syntax example is:
"useCaseConfig":
{
"dataType": "query"
}
modelConfig
Some parameters of the /ai/tokenization/MODEL_ID
request are common to all of the embedding
use cases, including the modelConfig
parameter.
Vector quantization
Quantization is implemented by converting float vectors into integer vectors, allowing for byte vector search using 8-bit integers. Float vectors, while very precise, are often a bit of a burden to compute and store, especially as they grow in dimensionality. One solution to this issue is to convert the vector floats into integers after inference, making byte vectors which are lower consumers of memory space and faster to compute with minimal loss in accuracy or quality.
Byte vectors are available through all of the Lucidworks LWAI hosted embedding models, including custom trained models.
Vector quantization methods are implemented through the modelConfig
parameter, vectorQuantizationMethod
. The methods are named min-max
and max-scale
.
-
The
min-max
method creates tensors of embeddings and converts them to uint8 by normalizing them to the range [0, 255]. -
The
max-scale
method finds the maximum absolute value along each embedding, normalizes the embeddings by scaling them to a range of -127 to 127, and returns the quantized embeddings as an 8-bit integer tensor.
During testing, it was found that the max-scale
method has no loss at the ten-thousandths place during evaluation against non-quantized vectors.
However, other methods lose precision when evaluated against non-quantized vectors, with min-max
losing the most precision.
The syntax example is:
"modelConfig": {
"vectorQuantizationMethod": "max-scale"
}
Matryoshka vector dimension reduction
Vector dimension reduction is the process of making the default vector size of a model smaller. The purpose of this reduction is to lessen the burden of storing large vectors while still achieving the good quality of a larger model.
The syntax example is:
"modelConfig": {
"dimReductionSize": 256
}
Examples
Sample POST request
The following example is a POST tokenization request. Replace the values in the APPLICATION_ID
, MODEL_ID
, and ACCESS_TOKEN
fields with your information.
curl --request POST \
--location 'https://{APPLICATION_ID}.applications.lucidworks.com/ai/tokenization/{MODEL_ID}' \
--header 'charset: utf-8' \
--header 'Cache-Control: no-cache' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ACCESS_TOKEN' \
--data '{
"batch": [
{
"text": "Mr. and Mrs. Dursley and O'\''Malley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much."
}
],
"useCaseConfig": {
"dataType": "query"
},
"modelConfig": {
"vectorQuantizationMethod": "max-scale"
}
}'
Sample response
Based on the request, the following example is the response for any of the embedding models:
{
"generatedTokens": [
{
"tokens": [
"[CLS]",
"query",
":",
"mr",
".",
"and",
"mrs",
".",
"du",
"##rs",
"##ley",
"and",
"o",
"'",
"malley",
",",
"of",
"number",
"four",
",",
"pri",
"##vet",
"drive",
",",
"were",
"proud",
"to",
"say",
"that",
"they",
"were",
"perfectly",
"normal",
",",
"thank",
"you",
"very",
"much",
".",
"[SEP]"
],
"tokensUsed": {
"inputTokens": 40
}
}
]
}