Product Selector

Fusion 5.9
    Fusion 5.9

    Tokenization APILucidworks AI Prediction API

    The Lucidworks AI Tokenization API returns Prediction API embedding use case tokens before being sent to any pre-trained embedding model or custom embedding model.

    This API is used to help debug embedding model tokens to ensure the input to the pre-trained or custom embedding model is valid, and within the model’s processing limits.

    Before the tokens are passed to the embedding model, it may be formatted, truncated, expanded, or modified in other ways to meet that model’s requirements so the API call is successful.

    These preprocessing steps are integral to deliver optimized tokens that generate coherent and relevant responses. By examining the tokenization after preprocessing, you can better understand how your input is being interpreted by the embedding models, which can help you refine your queries for more accurate and useful outputs.

    The input parameter keys and values are the same used in the Prediction API embedding use cases.

    Prerequisites

    To use this API, you need:

    • The unique APPLICATION_ID for your Lucidworks AI application. For more information, see credentials to use APIs.

    • A bearer token generated with a scope value of machinelearning.predict. For more information, see Authentication API.

    • The embedding model name in the MODEL_ID field for the request. The path is: /ai/tokenization/MODEL_ID. For more information about supported models, see Embedding use cases.

    Common parameters and fields

    useCaseConfig

    The parameter available in all of the embedding use cases is:

    "useCaseConfig": "dataType": "string"

    This optional parameter enables model-specific handling in the Prediction API to help improve model accuracy. Use the most applicable fields based on available dataTypes and the dataType value that best aligns with the text sent to the Prediction API.

    The string values to use for embedding models are:

    • "dataType": "query" for the query. For query-to-query pairing, best practice is to use dataType=query on both API calls.

    • "dataType": "passage" for fields searched at query time.

      For example, if questions and answers from a FAQ are indexed, the value for questions is "dataType": "query" and the value for the answers is "dataType": "passage".

    The syntax example is:

    "useCaseConfig":
      {
        "dataType": "query"
      }

    modelConfig

    Some parameters of the /ai/tokenization/MODEL_ID request are common to all of the embedding use cases, including the modelConfig parameter.

    Vector quantization

    Quantization is implemented by converting float vectors into integer vectors, allowing for byte vector search using 8-bit integers. Float vectors, while very precise, are often a bit of a burden to compute and store, especially as they grow in dimensionality. One solution to this issue is to convert the vector floats into integers after inference, making byte vectors which are lower consumers of memory space and faster to compute with minimal loss in accuracy or quality.

    Byte vectors are available through all of the Lucidworks LWAI hosted embedding models, including custom trained models.

    Vector quantization methods are implemented through the modelConfig parameter, vectorQuantizationMethod. The methods are named min-max and max-scale.

    • The min-max method creates tensors of embeddings and converts them to uint8 by normalizing them to the range [0, 255].

    • The max-scale method finds the maximum absolute value along each embedding, normalizes the embeddings by scaling them to a range of -127 to 127, and returns the quantized embeddings as an 8-bit integer tensor.

    During testing, it was found that the max-scale method has no loss at the ten-thousandths place during evaluation against non-quantized vectors. However, other methods lose precision when evaluated against non-quantized vectors, with min-max losing the most precision.

    The syntax example is:

    "modelConfig": {
            "vectorQuantizationMethod": "max-scale"
        }

    Matryoshka vector dimension reduction

    Vector dimension reduction is the process of making the default vector size of a model smaller. The purpose of this reduction is to lessen the burden of storing large vectors while still achieving the good quality of a larger model.

    The syntax example is:

    "modelConfig": {
            "dimReductionSize": 256
        }

    Examples

    Sample POST request

    The following example is a POST tokenization request. Replace the values in the APPLICATION_ID, MODEL_ID, and ACCESS_TOKEN fields with your information.

    curl --request POST \
    --location 'https://{APPLICATION_ID}.applications.lucidworks.com/ai/tokenization/{MODEL_ID}' \
    --header 'charset: utf-8' \
    --header 'Cache-Control: no-cache' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer ACCESS_TOKEN' \
    --data '{
        "batch": [
            {
                "text": "Mr. and Mrs. Dursley and O'\''Malley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much."
            }
        ],
        "useCaseConfig": {
            "dataType": "query"
        },
        "modelConfig": {
            "vectorQuantizationMethod": "max-scale"
        }
    }'

    Sample response

    Based on the request, the following example is the response for any of the embedding models:

    {
        "generatedTokens": [
            {
                "tokens": [
                    "[CLS]",
                    "query",
                    ":",
                    "mr",
                    ".",
                    "and",
                    "mrs",
                    ".",
                    "du",
                    "##rs",
                    "##ley",
                    "and",
                    "o",
                    "'",
                    "malley",
                    ",",
                    "of",
                    "number",
                    "four",
                    ",",
                    "pri",
                    "##vet",
                    "drive",
                    ",",
                    "were",
                    "proud",
                    "to",
                    "say",
                    "that",
                    "they",
                    "were",
                    "perfectly",
                    "normal",
                    ",",
                    "thank",
                    "you",
                    "very",
                    "much",
                    ".",
                    "[SEP]"
                ],
                "tokensUsed": {
                    "inputTokens": 40
                }
            }
        ]
    }