RAG use case

Headers

Authorization: Bearer ACCESS_TOKEN

string

The authentication and authorization access token.

Content-Type

string

application/json

Example:

"application/json"

Path Parameters

MODEL_ID

string

required

Unique identifier for the model.

Example:

"6a092bd4-5098-466c-94aa-40bf6829430\""

Body

application/json

batch

BatchRag · object[]

Show child attributes

batch.text

string

Content for the model to analyze. Multiple instances of text can be sent in the request.

Example:

"What is RAG?"

batch.documents

Document · object[]

Show child attributes

batch.documents.body

string

The contents of the document.

Example:

"Retrieval Augmented Generation, known as RAG, a framework promising to optimize generative AI."

batch.documents.source

string

The URL that identifies the source of the document.

Example:

"http://rag.com/22"

batch.documents.title

string

The title of the document.

Example:

"What are the benefits of RAG?"

batch.documents.date

string<date-time>

The date and time the document was created, displayed in the required ISO-8601 format of yyyy-mm-ddThh:mm:ssZ.

Example:

"2022-01-31T19:31:34Z"

useCaseConfig

UseCaseConfigRagExtDoc · object

Show child attributes

useCaseConfig.memoryUuid

string

The universal unique identifier (UUID) stored in the trained set of data in the model that is used in the model request.

This parameter is optional, and is used when previous chat history reference information is available.

Example:

"27a887fe-3d7c-4ef0-9597-e2dfc054c20e"

useCaseConfig.extractRelevantContent

boolean

default:false

This determines if relevant content can be extracted in the request. Set to true to help preserve key information in longer documents that might be lost during the default truncation, but at the cost of more LLM calls and a slower response time.

useCaseConfig.answerNotFoundMessage

string

default:Not possible to answer given this content.

This parameter is optional, and can be passed to change the response if the LLM cannot answer the request. The default is "Not possible to answer given this content."

modelConfig

ModelConfig · object

Provides fields and values that specify ranges for tokens. Fields used for specific use cases and models are specified. The default values are used if other values are not specified.

Show child attributes

modelConfig.temperature

number<float>

A sampling temperature between 0 and 2. A higher sampling temperature such as 0.8, results in more random (creative) output. A lower value such as 0.2 results in more focused (conservative) output. A lower value does not guarantee the model returns the same response for the same input. We recommend staying at or below a temperature of 1.0. Values above 1.0 might return nonsense unless the topP value is lowered to be more deterministic.

Required range: 0 <= x <= 2

Example:

0.8

modelConfig.topP

number<float>

A floating-point number between 0 and 1 that controls the cumulative probability of the top tokens to consider, known as the randomness of the LLM's response. This parameter is also referred to as top probability. Set topP to 1 to consider all tokens. A higher value specifies a higher probability threshold and selects tokens whose cumulative probability is greater than the threshold. The higher the value, the more diverse the output.

Required range: 0 <= x <= 1

Example:

1

modelConfig.topK

integer

An integer that controls the number of top tokens to consider. Set top_k to -1 to consider all tokens.

Example:

-1

modelConfig.presencePenalty

number<float>

A floating-point number between -2.0 and 2.0 that penalizes new tokens based on whether they have already appeared in the text. This increases the model's use of diverse tokens. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. This is applicable for all OpenAI, Mistral, and Llama models.

Required range: -2 <= x <= 2

Example:

2

modelConfig.frequencyPenalty

number<float>

A floating-point number between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. This is applicable for all OpenAI, Mistral, and Llama models.

Required range: -2 <= x <= 2

Example:

1

modelConfig.maxTokens

integer<int32>

The maximum number of tokens to generate per output sequence. The value is different for each model. Review individual model specifications when the value exceeds 2048.

Example:

1

modelConfig.apiKey

string

This optional parameter is only required when using the model for prediction. You can find this value in your model's settings:

OpenAI: Copy and paste the API key found in your organization's settings. For more information, see OpenAI Authentication API keys.
Azure OpenAI: Copy and paste the API key found in your Azure portal. See Authenticate with API key.
Anthropic: Copy and paste the API key found in your Anthrophic console or by using the Anthropic API.
Google Vertex AI: Copy and paste the base64-encoded service account key JSON found in your Google Cloud console. This service account key must have the Vertex AI user role enabled. For more information, see generate service account key.

Example:

"API key specific to use case and model"

modelConfig.azureDeployment

string

This optional parameter is the name of the deployed Azure OpenAI model and is only required when a deployed Azure OpenAI model is used for prediction.

Example:

"DEPLOYMENT_NAME"

modelConfig.azureEndpoint

string

This optional parameter is the URL endpoint of the deployed Azure OpenAI model and is only required when a deployed Azure OpenAI model is used for prediction.

Example:

"https://azure.endpoint.com"

modelConfig.googleProjectId

string

This parameter is optional, and is only required when a Google Vertex AI model is used for prediction.

Example:

"[GOOGLE_PROJECT_ID]"

modelConfig.googleRegion

string

This parameter is optional, and is only required when a Google Vertex AI model is used for prediction. The possible region values are:

us-central1
us-west4
northamerica-northeast1
us-east4
us-west1
asia-northeast3
asia-southeast1
asia-northeast

Example:

"[GOOGLE_PROJECT_REGION_OF_MODEL_ACCESS]"

Response

200 - application/json

predictions

object[]

Show child attributes

predictions.response

string

The unparsed response returned from the request.

Example:

"ANSWER: \\\"Retrieval Augmented Generation, known as RAG, a framework promising to optimize generative AI.\"\\nSOURCES: [\\\"http://example.com/112\\\"]"

predictions.tokensUsed

object

Show child attributes

predictions.tokensUsed.promptTokens

integer<int32>

The number of tokens generated to prompt the model to continue generating results.

Example:

148

predictions.tokensUsed.completionTokens

integer<int32>

The number of tokens used until the model completes.

Example:

27

predictions.tokensUsed.totalTokens

integer<int32>

The sum of the prompt and completion tokens used in the model.

Example:

175

predictions.answer

string

The parsed response text from the document.

Example:

"Retrieval Augmented Generation, known as RAG, a framework promising to optimize generative AI."

predictions.sources

string

The URL that identifies the source of the document returned in the response. Multiple results may be returned.

Example:

"http://example.com/112"

predictions.memoryUuid

string

The universal unique identifier (UUID) stored in the trained set of data in the model that is used in the model request.

This parameter is optional, and is used when previous chat history reference information is available.

Example:

"27a887fe-3d7c-4ef0-9597-e2dfc054c20e"

predictions.answerNotFoundMessage

string

default:Not possible to answer given this content.

This parameter is optional, and can be passed to change the response if the LLM cannot answer the request. The default is "Not possible to answer given this content."

predictions.answerFound

boolean

This parameter is false if the value in the answerNotFoundMessage field is used in the response. If an answer is returned for the request, this parameter is true.

Models

Authentication

Use Case

Predict

Async Chunking

Async Predict

Prompt Preview

Signals

Tokenization

Headers

Path Parameters

Body

Response