import requests
url = "https://application_id.applications.lucidworks.com/ai/tokenization/{MODEL_ID}"
payload = {
"batch": [{ "text": "Mr. and Mrs. Dursley and O'\''Malley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much" }],
"useCaseConfig": { "dataType": "query or passage" },
"modelConfig": {
"vectorQuantizationMethod": "max-scale",
"dimReductionSize": 256
}
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print(response.text)The tokenization request for the pre-trained and custom embedding use cases and specified embedding modelId (model name) sends text to return results in formats supported by embedding models.
import requests
url = "https://application_id.applications.lucidworks.com/ai/tokenization/{MODEL_ID}"
payload = {
"batch": [{ "text": "Mr. and Mrs. Dursley and O'\''Malley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much" }],
"useCaseConfig": { "dataType": "query or passage" },
"modelConfig": {
"vectorQuantizationMethod": "max-scale",
"dimReductionSize": 256
}
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print(response.text)The authentication and authorization access token.
application/json
"application/json"
The name of the pre-trained or custom embedding model.
"e5-small-v2"
Show child attributes
This optional parameter enables model-specific handling in the Prediction API to help improve model accuracy. Use the most applicable fields based on available dataTypes and the dataType value that best aligns with the text sent to the Prediction API.
The two string values to use for embedding models are:
"dataType": "query" for the query. For query-to-query pairing, best practice is to use dataType=query on both API calls.
"dataType": "passage" for fields searched at query time.
For example, if questions and answers from a FAQ are indexed, the value for questions is "dataType": "query" and the value for the answers is "dataType": "passage".
"query or passage"
Show child attributes
Quantization is implemented by converting float vectors into integer vectors, allowing for byte vector search using 8-bit integers. Float vectors, while very precise, are often a bit of a burden to compute and store, especially as they grow in dimensionality. One solution to this issue is to convert the vector floats into integers after inference, making byte vectors which are lower consumers of memory space and faster to compute with minimal loss in accuracy or quality.
The following are the available options:
The min-max method creates tensors of embeddings and converts them to uint8 by normalizing them to the range [0, 255].
The max-scale method finds the maximum absolute value along each embedding, normalizes the embeddings by scaling them to a range of -127 to 127, and returns the quantized embeddings as an 8-bit integer tensor.
"max-scale"
Vector dimension reduction is the process of making the default vector size of a model smaller. The purpose of this reduction is to lessen the burden of storing large vectors while still achieving the good quality of a larger model.
256
OK
Show child attributes
The array of tokens derived from the text submitted in the request.
For example:
"generatedTokens": [
{
"tokens": [
"[CLS]",
"query",
":",
"mr",
".",
"and",
"mrs",
".",
"du",
"##rs",
"##ley",
"and",
"o",
"'",
"malley",
",",
"of",
"number",
"four",
",",
"pri",
"##vet",
"drive",
",",
"were",
"proud",
"to",
"say",
"that",
"they",
"were",
"perfectly",
"normal",
",",
"thank",
"you",
"very",
"much",
".",
"[SEP]"
],Show child attributes
The number of tokens created from the text input into the model.
40
The number of tokens generated to prompt the model to continue generating results.
148
The number of tokens used until the model completes. This value is always zero (0).
0
The sum of the prompt and completion tokens used in the model.
175
Was this page helpful?