Pass-through use caseLucidworks AI Async Prediction API
The Pass-through use case of the Lucidworks AI Async Prediction API lets you use the service as a proxy to the LLM. The service sends text (no additional prompts or other information) to the LLM and returns a response.
The Pass-through use case contains two requests:
-
POST request - submits a prediction task for a specific
useCase
andmodelId
. The API responds with the following information:-
predictionId
. A unique UUID for the submitted prediction task that can be used later to retrieve the results. -
status
. The current state of the prediction task.
-
-
GET request - uses the
predictionId
you submit from a previously-submitted POST request and returns the results associated with that previous request.
Prerequisites
To use this API, you need:
-
The unique
APPLICATION_ID
for your Lucidworks AI application. For more information, see credentials to use APIs. -
A bearer token generated with a scope value of
machinelearning.predict
. For more information, see Authentication API. -
The
USE_CASE
andMODEL_ID
fields in the/async-prediction
for the POST request. The path is/ai/async-prediction/USE_CASE/MODEL_ID
. A list of supported modes is returned in the Lucidworks AI Use Case API. For more information about supported models, see Generative AI models.
Common POST request parameters and fields
modelConfig
Some parameters of the /ai/async-prediction/USE_CASE/MODEL_ID
POST request are common to all of the generative AI (GenAI) use cases, including the modelConfig
parameter. If you do not enter values, the following defaults are used.
"modelConfig":{
"temperature": 0.7,
"topP": 1.0,
"presencePenalty": 0.0,
"frequencyPenalty": 0.0,
"maxTokens": 256
}
Also referred to as hyperparameters, these fields set certain controls on the response of a LLM:
Field | Description |
---|---|
temperature |
A sampling temperature between 0 and 2. A higher sampling temperature such as 0.8, results in more random (creative) output. A lower value such as 0.2 results in more focused (conservative) output. A lower value does not guarantee the model returns the same response for the same input. |
topP |
A floating-point number between 0 and 1 that controls the cumulative probability of the top tokens to consider, known as the randomness of the LLM’s response. This parameter is also referred to as top probability. Set |
presencePenalty |
A floating-point number between -2.0 and 2.0 that penalizes new tokens based on whether they have already appeared in the text. This increases the model’s use of diverse tokens. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. |
frequencyPenalty |
A floating-point number between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. |
maxTokens |
The maximum number of tokens to generate per output sequence. The value is different for each model. Review individual model specifications when the value exceeds 2048. |
apiKey |
The optional parameter is only required when the specified model is used for prediction. This secret value is specified in the external model. For:
The parameter (for OpenAI, Azure OpenAI, or Google VertexAI models) is only available for the following use cases:
|
azureDeployment |
The optional |
azureEndpoint |
The optional |
googleProjectId |
The optional |
googleRegion |
The optional
|
POST response parameters and fields
The response to the POST /ai/async-prediction/USE_CASE/MODEL_ID
requests are as follows:
Field | Description |
---|---|
predictionId |
The universal unique identifier (UUID) returned in the POST request. This UUID is required in the GET request to retrieve results. For example, fd110486-f168-47c0-a419-1518a4840589. |
status |
The current status of the prediction. Values are:
|
Unique values for the pass-through use case
The parameters available in the passthrough
use case are:
If both useSystemPrompt and dataType are present, the value in dataType is used.
|
Use System Prompt
"useCaseConfig": "useSystemPrompt": boolean
This parameter can be used:
-
If custom prompts are needed, or if the prompt response format needs to be manipulated.
-
But the prompt length may increase response time.
Some models, such as the
mistral-7b-instruct
andllama-3-8b-instruct
, generate more effective results when system prompts are included in the request.If
"useSystemPrompt": true
, the LLM input is automatically wrapped into a model-specific prompt format with a generic system prompt before passing it to the model or third-party API.If
"useSystemPrompt": false
, thebatch.text
value serves as the prompt for the model. The LLM input must accommodate model-specific requirements because the input is passed as is.
Examples:
-
The format for the
mistral-7b-instruct
model must be specific to Mistral:https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
-
The format for the
llama-3-8b-instruct
model must be specific to Llama:https://huggingface.co/blog/llama3#how-to-prompt-llama-3
-
The text input for OpenAI models must be valid JSON to match the OpenAI API specification:
https://platform.openai.com/docs/api-reference/chat/create
-
The format for the Google VertexAI models must adhere to the guidelines at:
https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini
POST example
This useSystemPrompt
POST example does not include modelConfig
parameters, but you can submit requests that include parameters described in Common POST request parameters and fields.
curl --request POST \
--url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/passthrough/MODEL_ID \
--header 'Authorization: Bearer ACCESS_TOKEN' \
--header 'Content-type: application/json' \
--data '{
"batch": [
{
"text": "who was the first president of the USA?"
}
],
"useCaseConfig": {
"useSystemPrompt": true
}
}'
The following is an example of a successful response:
{
"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
"status": "SUBMITTED"
}
The following is an example of an error response:
{
"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
"status": "ERROR",
"message": "System prompt exceeded the maximum number of allowed input tokens: 81 vs -1091798"
}
GET example
curl --request GET
--url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/PREDICTION_ID
--header 'Authorization: Bearer Auth '
The following is an example response:
{
"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
"status": "READY",
"predictions": [
{
"response": "The first President of the United States was George Washington.",
"tokensUsed": {
"promptTokens": 49,
"completionTokens": 11,
"totalTokens": 60
}
}
]
}
Data Type
"useCaseConfig": "dataType": "string"
This optional parameter enables model-specific handling in the Async Prediction API to help improve model accuracy. Use the most applicable fields based on available dataTypes and the dataType value that best aligns with the text sent to the Async Prediction API.
The values for dataType
in the Passthrough use case are:
-
"dataType": "text"
This value is equivalent to
"useSystemPrompt": true
and is a pre-defined, generic prompt. -
"dataType": "raw_prompt"
This value is equivalent to
"useSystemPrompt": false
and is passed directly to the model or third-party API. -
"dataType": "json_prompt"
This value follows the generics that allow three roles:
-
system
-
user
-
Only the last user message is truncated.
-
If the API does not support system prompts, the user role is substituted for the system role.
-
-
assistant
-
If the last message role is
assistant
, it is used as a pre-fill for generation and is the first generated token the model uses. The pre-fill is prepended to the model output, which makes models less verbose and helps enforce specific outputs such as YAML. -
The Google VertexAI does not support generation pre-fills, so an exception error is generated.
This follows the HuggingFace template contraints at Hugging Face chat templates.
-
Additional
json_prompt
information:-
Consecutive messages for the same role are merged.
-
You can paste the information for a hosted model into the
json_prompt
value and change the model name in the stage.
-
-
-
POST example
This "dataType": "json_prompt"`
example does not include modelConfig
parameters, but you can submit requests that include parameters described in Common parameters and fields.
curl --request POST \
--url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/passthrough/MODEL_ID \
--header 'Authorization: Bearer ACCESS_TOKEN' \
--header 'Content-type: application/json' \
--data '{
"batch": [
{
"text": "[{\"role\": \"system\", \"content\": \"You are a helpful utility program instructed to accomplish a word correction task. Provide the most likely suggestion to the user without an preamble or elaboration.\"}, {\"role\": \"user\", \"content\": \"misspeled\"}, {\"role\": \"assistant\", \"content\": \"CORRECT:\"}]"
}
],
"useCaseConfig" :{
"dataType" : "json_prompt"
}
}'
The following is an example response:
{
"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
"status": "SUBMITTED"
}
GET example
curl --request GET \
--url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/fd110486-f168-47c0-a419-1518a4840589 \
--header 'Authorization: Bearer ACCESS_TOKEN'
The following is an example response:
{
"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
"status": "READY",
"predictions": [
{
"tokensUsed": {
"promptTokens": 51,
"completionTokens": 4,
"totalTokens": 55
},
"response": "CORRECT: misspelled"
}
]
}