> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Tokenization API

> Lucidworks AI Prediction API

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-tokenization-api

[mintlify link]: https://doc.lucidworks.com/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-tokenization-api

[old doc.lw link]: https://doc.lucidworks.com/lw-platform/ai/mpgw6g

The Lucidworks AI Tokenization API returns [Prediction API `embedding` use case](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-prediction-api/embedding-prediction) tokens before being sent to any [pre-trained embedding model](/docs/lw-platform/lw-ai/lw-ai-pre-trained-embedding-models) or [custom embedding model](/docs/lw-platform/lw-ai/lw-ai-custom-embedding-model-training/overview).

This API is used to help debug `embedding` model tokens to ensure the input to the pre-trained or custom embedding model is valid, and within the model’s processing limits.

<Note>
  Before the tokens are passed to the embedding model, it may be formatted, truncated, expanded, or modified in other ways to meet that model’s requirements so the API call is successful.
</Note>

These preprocessing steps are integral to deliver optimized tokens that generate coherent and relevant responses. By examining the tokenization after preprocessing, you can better understand how your input is being interpreted by the embedding models, which can help you refine your queries for more accurate and useful outputs.

The input parameter keys and values are the same used in the [Prediction API `embedding` use cases](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-prediction-api/embedding-prediction).

<Note>
  For detailed API specifications in Swagger/OpenAPI format, see [Platform APIs](/api-reference/get-tokens/tokenization-by-model_id).
</Note>

<LwTemplate />

## Prerequisites

To use this API, you need:

* The unique `APPLICATION_ID` for your Lucidworks AI application, which is provided by Lucidworks.
* A bearer token generated with a scope value of `machinelearning.predict`. For more information, see [Authentication API](/docs/lw-platform/lw-platform/authentication-api).
* The embedding model name in the `MODEL_ID` field for the request. The path is: `/ai/tokenization/MODEL_ID`. For more information about supported models, see [Embedding use cases](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-prediction-api/embedding-prediction).

## Common parameters and fields

Some parameters in the `/ai/tokenization/MODEL_ID` request are common to all of the Async Chunking API requests, such as the `modelConfig` parameter.
Also referred to as hyperparameters, these fields set certain controls on the response.
Refer to the [API spec](/api-reference/get-tokens/tokenization-by-model_id) for more information.

### Vector quantization

Quantization is implemented by converting float vectors into integer vectors, allowing for byte vector search using 8-bit integers.
Float vectors, while very precise, are often a bit of a burden to compute and store, especially as they grow in dimensionality.
One solution to this issue is to convert the vector floats into integers after inference, making byte vectors which are lower consumers of memory space and faster to compute with minimal loss in accuracy or quality.

Byte vectors are available through all of the Lucidworks LWAI hosted embedding models, including custom trained models.

Vector quantization methods are implemented through the `modelConfig` parameter, `vectorQuantizationMethod`. The methods are named `min-max` and `max-scale`.

* The `min-max` method creates tensors of embeddings and converts them to uint8 by normalizing them to the range \[0, 255].
* The `max-scale` method finds the maximum absolute value along each embedding, normalizes the embeddings by scaling them to a range of -127 to 127, and returns the quantized embeddings as an 8-bit integer tensor.

During testing, it was found that the `max-scale` method has no loss at the ten-thousandths place during evaluation against non-quantized vectors.
However, other methods lose precision when evaluated against non-quantized vectors, with `min-max` losing the most precision.

The syntax example is:

```json wrap  theme={"dark"}
"modelConfig": {
        "vectorQuantizationMethod": "max-scale"
    }
```

### Matryoshka vector dimension reduction

Vector dimension reduction is the process of making the default vector size of a model smaller. The purpose of this reduction is to lessen the burden of storing large vectors while still achieving the good quality of a larger model.

The syntax example is:

```json wrap  theme={"dark"}
"modelConfig": {
        "dimReductionSize": 256
    }
```

## Examples

The following example is a POST tokenization request. Replace the values in the `APPLICATION_ID`, `MODEL_ID`, and `ACCESS_TOKEN` fields with your information.

<CodeGroup>
  ```json wrap Request theme={"dark"}
  curl --request POST \
  --location 'https://{APPLICATION_ID}.applications.lucidworks.com/ai/tokenization/{MODEL_ID}' \
  --header 'charset: utf-8' \
  --header 'Cache-Control: no-cache' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ACCESS_TOKEN' \
  --data '{
      "batch": [
          {
              "text": "Mr. and Mrs. Dursley and O'\''Malley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much."
          }
      ],
      "useCaseConfig": {
          "dataType": "query"
      },
      "modelConfig": {
          "vectorQuantizationMethod": "max-scale"
      }
  }'
  ```

  ```json wrap Response theme={"dark"}
  {
      "generatedTokens": [
          {
              "tokens": [
                  "[CLS]",
                  "query",
                  ":",
                  "mr",
                  ".",
                  "and",
                  "mrs",
                  ".",
                  "du",
                  "##rs",
                  "##ley",
                  "and",
                  "o",
                  "'",
                  "malley",
                  ",",
                  "of",
                  "number",
                  "four",
                  ",",
                  "pri",
                  "##vet",
                  "drive",
                  ",",
                  "were",
                  "proud",
                  "to",
                  "say",
                  "that",
                  "they",
                  "were",
                  "perfectly",
                  "normal",
                  ",",
                  "thank",
                  "you",
                  "very",
                  "much",
                  ".",
                  "[SEP]"
              ],
              "tokensUsed": {
                  "inputTokens": 40
              }
          }
      ]
  }
  ```
</CodeGroup>
