> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Embedding use cases

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[old doc.lw link]: https://doc.lucidworks.com/lw-platform/ai/dyl5lt

[localhost link]: http://localhost:3000/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-prediction-api/embedding-prediction

[mintlify link]: https://doc.lucidworks.com/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-prediction-api/embedding-prediction

The Embedding use cases of the [LWAI Prediction API](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-prediction-api/overview) include various encoder use cases and the custom model prediction. The use cases are:

* English language model text encoder
* Multilingual language model text encoder
* Custom model

<Note>
  For detailed API specifications in Swagger/OpenAPI format, see [Platform APIs](/api-reference/get-predictions/english-language-model-text-encoder).
</Note>

<LwTemplate />

## Prerequisites

To use this API, you need:

* The unique `APPLICATION_ID` for your Lucidworks AI application, which is provided by Lucidworks.
* A bearer token generated with a scope value of `machinelearning.predict`. For more information, see [Authentication API](/docs/lw-platform/lw-platform/authentication-api).
* The `USE_CASE` and `MODEL_ID` fields for the use case request. The path is: `/ai/prediction/USE_CASE/MODEL_ID`. A list of supported models is returned in the [Lucidworks AI Use Case API](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-use-case-api).

## Unique values for the embeddings use cases

Some parameter values available in the `embeddings` use case are unique to this use case, including values for the `useCaseConfig` parameter.
Refer to the [API spec](/api-reference/get-predictions/english-language-model-text-encoder) for more information.

### Vector quantization

Quantization is implemented by converting float vectors into integer vectors, allowing for byte vector search using 8-bit integers.
Float vectors, while very precise, are often a bit of a burden to compute and store, especially as they grow in dimensionality.
One solution to this issue is to convert the vector floats into integers after inference, making byte vectors which are lower consumers of memory space and faster to compute with minimal loss in accuracy or quality.

Byte vectors are available through all of the Lucidworks LWAI hosted embedding models, including custom trained models.

Vector quantization methods are implemented through the `modelConfig` parameter, `vectorQuantizationMethod`. The methods are named `min-max` and `max-scale`.

* The `min-max` method creates tensors of embeddings and converts them to uint8 by normalizing them to the range \[0, 255].
* The `max-scale` method finds the maximum absolute value along each embedding, normalizes the embeddings by scaling them to a range of -127 to 127, and returns the quantized embeddings as an 8-bit integer tensor.

During testing, it was found that the `max-scale` method has no loss at the ten-thousandths place during evaluation against non-quantized vectors.
However, other methods lose precision when evaluated against non-quantized vectors, with `min-max` losing the most precision.

<CodeGroup>
  ```json wrap Request theme={"dark"}
  curl --request POST \
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/prediction/embedding/{MODEL_ID} \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer ACCESS_TOKEN'
    --data '{
      "batch": [
        {
          "text": "I need to pick up some fresh produce like apples, bananas, and spinach, as well as dairy products including milk, eggs, and cheddar cheese. I'\''ll also need to get some pantry staples such as pasta, rice, and canned tomatoes. Additionally, I should grab a loaf of whole-grain bread, some chicken breasts, ground beef, and a box of cereal for breakfast. Don'\''t forget to add a bottle of olive oil, a jar of peanut butter, and some snacks like granola bars and yogurt. Finally, I need to remember to buy cleaning supplies like dish soap and paper towels, along with a carton of almond milk and a few avocados for the week."
        }
      ],
      "useCaseConfig": {
        "dataType": "passage"
      },
      "modelConfig": {
        "vectorQuantizationMethod": "max-scale"
      }
    }'
  ```

  ```json wrap Response theme={"dark"}
  {
      "predictions": [
          {
              "tokensUsed": {
                  "inputTokens": 148
              },
              "vector": [ -23,-5,23,-15,25,13,27,26,-27,-18,-7,-32,23,8,28,-22,17,16,-42,7,6,0,-14,-13,30,15,-4,-4,-32,-87,-3,-12,19,-11,-1,-11,0,19,-12,12,27,3,11,-25,-15,-21,-16,5,47,-20,10,-1,-6,0,8,22,24,-2,20,26,18,12,-76,67,16,-1,-10,13,7,26,-32,11,18,35,10,-11,13,-14,-7,11,-7,-10,-8,0,-15,-13,-7,-16,27,-11,-11,-14,-3,12,-35,-23,0,-9,-33,106,-19,8,27,-11,5,-37,-9,-11,-11,14,-5,-7,26,-36,25,10,49,1,-7,4,1,15,22,8,3,-41,7,29,2,24,22,-15,-1,-7,11,14,12,-12,24,-8,-23,-31,-2,-83,-1,63,1,36,-20,-4,-5,27,5,-15,5,12,28,6,-6,8,-32,-15,-24,12,3,-41,-13,13,-1,-10,32,7,-22,10,66,-2,-7,21,-4,0,29,-19,-33,13,33,-35,-6,-6,11,8,-2,-4,-23,-14,-31,-12,-19,-2,17,-12,-4,-24,0,11,24,3,-10,-17,53,16,-5,35,42,2,-40,11,29,9,20,28,49,-40,-25,-109,16,12,6,15,-6,9,-21,-2,22,35,-18,-9,19,-11,21,12,-7,-9,13,-16,-8,4,-20,31,-8,83,30,13,-34,12,5,-6,-57,-2,4,0,-30,-23,-24,-19,13,-20,-27,-26,-24,-14,14,-5,2,13,-10,6,-13,-1,-11,-21,0,-20,10,-13,-10,2,-3,13,-3,0,-26,1,-5,-5,2,3,-16,12,-2,11,-6,-9,-31,10,-14,13,16,28,14,20,-8,26,-10,16,5,-16,-9,28,-4,-127,52,11,11,6,-3,12,15,-19,16,31,-2,25,-20,3,14,18,0,1,-33,10,-8,78,1,-5,9,14,9,5,13,33,0,22,-19,7,7,-12,9,2,2,-27,-2,-35,-17,26,-62,-8,-45,8,4,-13,-1,-13,3,17,-1,-16,7,-12,0,11,-14,-13,18,5 ]
          }
      ]
  }
  ```
</CodeGroup>

### Matryoshka vector dimension reduction

Vector dimension reduction is the process of making the default vector size of a model smaller. The purpose of this reduction is to lessen the burden of storing large vectors while still achieving the good quality of a larger model.

The technique is called [Matryoshka Representation Learning (MRL)](https://sbert.net/examples/training/matryoshka/README.html) and lets you reduce vector size while maintaining good quality.

For information about the pre-trained embedding models that use the Matryoshka Representation Learning technique, see:

* [snowflake-arctic-embed-m-v2.0 model](/docs/lw-platform/lw-ai/lw-ai-pre-trained-embedding-models#snowflake-arctic-embed-m-v2-0)
* [snowflake-arctic-embed-l-v2.0 model](/docs/lw-platform/lw-ai/lw-ai-pre-trained-embedding-models#snowflake-arctic-embed-l-v2-0)

<Note>
  You can reduce vectors for any model using the `modelConfig` `dimReductionSize`, which allows any integer above 0, but less than or equal to the vector dimension of the model.
</Note>

<CodeGroup>
  ```json Configuration theme={"dark"}
  "modelConfig": {
          "dimReductionSize": 256
      }
  ```

  ```json wrap 400 Bad Request theme={"dark"}
  {
      "message": "The Matryoshka dimensionality reduction size is greater than the model's vector size. Please provide a valid reduction size."
  }
  ```

  ```json wrap Warning response theme={"dark"}
  {
      "predictions": [
          {
              "tokensUsed": {
                  "inputTokens": 40
              },
              "vector": [ -0.039299455, 0.33177176, 0.13610777, -0.004845184, -0.3394944, 0.34524646, 0.3076837, -0.3135057, -0.24342005, 0.067009725, 0.3919114, -0.091709316, 0.027848499, 0.29557163, -0.35828146, -0.01322772],
              "warning": "Matryoshka dimensionality reduction is not supported for this model. As such, the quality of the reduced embeddings could be significantly affected."
          }
      ]
  }
  ```
</CodeGroup>

### Change input to lowercase

The `lowercaseInput` field in the `modelConfig` parameter is optional and controls how query response text becomes vectors. If `lowercaseInput` is set to `true`, inputs are set to lowercase before encoding, so variation in embeddings is not introduced (which makes similarity search more stable and predictable).  This field is used for pre-trained models and instances in a similarity query when you don't want the case to affect the vectors returned in the query. Default value is `false`.

<CodeGroup>
  ```json wrap Request theme={"dark"}
  curl --request POST \
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/prediction/embedding/text-encoder \
    --header 'Authorization: Bearer ACCESS_TOKEN' \
    --header 'Content-Type: application/json' \
    --data '{
      "batch": [
          {
              "text": "UPPERCASE"
          },
          {
              "text": "uppercase"
          }	
      ],
      "modelConfig": {
          "lowercaseInput":"true"
      }
  }'
  ```

  ```json wrap Response theme={"dark"}
  {
      "predictions": [
          {
              "tokensUsed": {
                  "inputTokens": 4
              },
              "vector": [
                  -0.012489319,
                  0.019470215,
                  -0.012825012,
                  -0.019821167,
                  -0.020202637,
                  0.020370483,
                  ...
                  0.05908203,
                  0.06530762,
                  0.01600647,
                  0.013671875,
                  -0.058044434,
                  -0.057861328,
                  -0.009681702
              ]
          },
          {
              "tokensUsed": {
                  "inputTokens": 4
              },
             "vector": [
                  -0.012489319,
                  0.019470215,
                  -0.012825012,
                  -0.019821167,
                  -0.020202637,
                  0.020370483,
                  ...
                  0.05908203,
                  0.06530762,
                  0.01600647,
                  0.013671875,
                  -0.058044434,
                  -0.057861328,
                  -0.009681702
              ]
          }
      ]
  }
  ```
</CodeGroup>

## English language model text encoder

The English language encoder takes in plain English text and returns a 768-dimensional vector encoding of that text. This model powers this semantic search.

The API truncates incoming text to approximately 256 words before the model encodes it and returns a vector. An example usage pattern is to encode all the texts and descriptions in a website and then use this encoder on query text, supporting natural language queries such as "1990s children’s fiction".

Each API request includes one batch containing up to 32 text strings.

<CodeGroup>
  ```json wrap Request theme={"dark"}
  curl --request POST \
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/prediction/embedding/text-encoder \
    --header 'Authorization: Bearer ACCESS_TOKEN' \
    --header 'Content-Type: application/json' \
    --data '{
    "batch": [
      {
        "text": "city streets"
      }
    ],
    "useCaseConfig":
      {
        "dataType": "query"
      }
  }'
  ```

  ```json wrap Response theme={"dark"}
  {
  	"predictions": [
  		{
  			"vector": [
  				0.0028902769554406404,
  				-0.04393249750137329,
  				0.015302237123250961

  			],
              "vector": [
                  0.0028902769554406404,
                  -0.04393249750137329,
                  0.015302237123250961

            ]
    	    }
  	]
  }
  ```
</CodeGroup>

## Multilingual language model text encoder

When you use the [custom model prediction endpoint](/api-reference/get-predictions/custom-model-prediction) with a multilingual model, the multilingual encoder takes in plain text and returns a 384-dimensional vector encoding of that text. The API truncates incoming text to approximately 256 words before the model encodes it and returns a vector.

Each API request includes one batch containing up to 32 text strings.

The text strings in a batch do not have to be in the same language. You can also use words from multiple languages with each `text` value. Because long text strings are truncated to approximately 256 words, the order and length of the value affects the return results.

<CodeGroup>
  ```json wrap Request theme={"dark"}
  curl --request POST \
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/prediction/embedding/multilingual-e5-base \
    --header 'Authorization: Bearer ACCESS_TOKEN' \
    --header 'Content-Type: application/json' \
    --data '{
    "batch": [
      {
        "text": "city streets",
        "text": "city gateways"
      }
    ],
    "useCaseConfig":
      {
        "dataType": "query"
      }
  }'
  ```

  The following is an example response:

  ```json wrap Response theme={"dark"}
  {
  	"predictions": [
  		{
  			"vector": [
  				0.0028902769554406404,
  				-0.04393249750137329,
  				0.015302237123250961

  			],
              "vector": [
                  0.0028902769554406404,
                  -0.04393249750137329,
                  0.015302237123250961

            ]
    	    }
  	]
  }
  ```
</CodeGroup>

If a custom model is trained and deployed using the [Lucidworks AI Models API](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-models-api), the 'DEPLOYMENT\_ID' in the Models API is the same value as the `MODEL_ID` you enter in the custom model to return a prediction.

<CodeGroup>
  ```json wrap Request theme={"dark"}
  curl --request POST \
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/prediction/embedding/MODEL_ID \
    --header 'Authorization: Bearer ACCESS_TOKEN' \
    --header 'Content-Type: application/json' \
    --data '{
    "batch": [
      {
        "text": "city streets",
        "text": "city gateways"
      }
    ],
    "useCaseConfig":
      {
        "dataType": "query"
      }
  }'
  ```

  ```json wrap Response theme={"dark"}
  {
  	"predictions": [
  		{
  			"vector": [
  				0.0028902769554406404,
  				-0.04393249750137329,
  				0.015302237123250961

  			],
              "vector": [
                  0.0028902769554406404,
                  -0.04393249750137329,
                  0.015302237123250961

            ]
    	    }
  	]
  }
  ```
</CodeGroup>

## Verify information sent to embedding models

The Lucidworks AI Tokenization API returns Prediction API `embedding` use case tokens before being sent to any [pre-trained embedding model](/docs/lw-platform/lw-ai/lw-ai-pre-trained-embedding-models) or [custom embedding model](/docs/lw-platform/lw-ai/lw-ai-custom-embedding-model-training/overview). You can use this information to help debug and ensure the input to the pre-trained or custom embedding model is valid, and within the model’s processing limits.

For more information, see [Tokenization API](/docs/lw-platform/lw-ai/lw-ai-apis/lw-ai-tokenization-api).
