The passthrough use case lets you use the service as a proxy to the large language model (LLM). The service sends text (no additional prompts or other information) to the LLM and returns a response.
The authentication and authorization access token.
application/json
"application/json"
Unique identifier for the model.
"6a092bd4-5098-466c-94aa-40bf6829430\""
NOTE: If both useSystemPrompt and dataType are present, the value in dataType is used.
Show child attributes
This optional parameter contains a default value of true. If set to false, the batch.text value serves as the prompt for the model. The prompt must be in a specific format the model can comprehend.
This optional parameter enables model-specific handling in the Prediction API to help improve model accuracy. Use the most applicable fields based on available dataTypes and the dataType value that best aligns with the text sent to the Prediction API.
The values for dataType in the Passthrough use case are:
"dataType": "text" - This value is equivalent to "useSystemPrompt": true and is a pre-defined, generic prompt.
"dataType": "raw_prompt" - This value is equivalent to "useSystemPrompt": false and is passed directly to the model or third-party API.
"dataType": "json_prompt" - This value follows the generics that allow three roles:
system
user
assistant
assistant, it is used as a pre-fill for generation and is the first generated token the model uses. The pre-fill is prepended to the model output, which makes models less verbose and helps enforce specific outputs such as YAML.This follows the HuggingFace template contraints at https://huggingface.co/docs/transformers/main/en/chat_templating.
Additional json_prompt information:
json_prompt value and change the model name in the stage."json_prompt"
Provides fields and values that specify ranges for tokens. Fields used for specific use cases and models are specified. The default values are used if other values are not specified.
Show child attributes
A sampling temperature between 0 and 2. A higher sampling temperature such as 0.8, results in more random (creative) output. A lower value such as 0.2 results in more focused (conservative) output. A lower value does not guarantee the model returns the same response for the same input. We recommend staying at or below a temperature of 1.0. Values above 1.0 might return nonsense unless the topP value is lowered to be more deterministic.
0 <= x <= 20.8
A floating-point number between 0 and 1 that controls the cumulative probability of the top tokens to consider, known as the randomness of the LLM's response. This parameter is also referred to as top probability. Set topP to 1 to consider all tokens. A higher value specifies a higher probability threshold and selects tokens whose cumulative probability is greater than the threshold. The higher the value, the more diverse the output.
0 <= x <= 11
An integer that controls the number of top tokens to consider. Set top_k to -1 to consider all tokens.
-1
A floating-point number between -2.0 and 2.0 that penalizes new tokens based on whether they have already appeared in the text. This increases the model's use of diverse tokens. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. This is applicable for all OpenAI, Mistral, and Llama models.
-2 <= x <= 22
A floating-point number between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. This is applicable for all OpenAI, Mistral, and Llama models.
-2 <= x <= 21
The maximum number of tokens to generate per output sequence. The value is different for each model. Review individual model specifications when the value exceeds 2048.
1
This optional parameter is only required when using the model for prediction. You can find this value in your model's settings:
OpenAI: Copy and paste the API key found in your organization's settings. For more information, see OpenAI Authentication API keys.
Azure OpenAI: Copy and paste the API key found in your Azure portal. See Authenticate with API key.
Anthropic: Copy and paste the API key found in your Anthrophic console or by using the Anthropic API.
Google Vertex AI: Copy and paste the base64-encoded service account key JSON found in your Google Cloud console. This service account key must have the Vertex AI user role enabled. For more information, see generate service account key.
"API key specific to use case and model"
This optional parameter is the name of the deployed Azure OpenAI model and is only required when a deployed Azure OpenAI model is used for prediction.
"DEPLOYMENT_NAME"
This optional parameter is the URL endpoint of the deployed Azure OpenAI model and is only required when a deployed Azure OpenAI model is used for prediction.
"https://azure.endpoint.com"
This parameter is optional, and is only required when a Google Vertex AI model is used for prediction.
"[GOOGLE_PROJECT_ID]"
This parameter is optional, and is only required when a Google Vertex AI model is used for prediction. The possible region values are:
"[GOOGLE_PROJECT_REGION_OF_MODEL_ACCESS]"
OK
Show child attributes
The results returned from the request.
"The first President of the United States was George Washington."
Show child attributes
The number of tokens generated to prompt the model to continue generating results.
148
The number of tokens used until the model completes.
27
The sum of the prompt and completion tokens used in the model.
175
Was this page helpful?