Product Selector

Fusion 5.9
    Fusion 5.9

    Retrieval augmented generation (RAG) use caseLucidworks AI Async Prediction API

    The Retrieval augmented generation (RAG) use case of the Lucidworks AI Async Prediction API uses candidate documents that are inserted into a LLM’s context to ground the generated response to those documents instead of generating an answer from details stored in the LLM’s trained weights. This helps prevent frequency of LLM hallucinative responses. This type of search adds guardrails so the LLM can search private data collections.

    The RAG search can perform queries against external documents passed in as part of the request.

    This use case can be used:

    • To generate answers based on the context of the responses collected (corpus)

    • To generate a response based on the context from responses to a previous request

    The RAG use case contains two requests:

    • POST request - submits a prediction task for a specific useCase and modelId. The API responds with the following information:

      • predictionId. A unique UUID for the submitted prediction task that can be used later to retrieve the results.

      • status. The current state of the prediction task.

    • GET request - uses the predictionId you submit from a previously-submitted POST request and returns the results associated with that previous request.

    Prerequisites

    To use this API, you need:

    • The unique APPLICATION_ID for your Lucidworks AI application. For more information, see credentials to use APIs.

    • A bearer token generated with a scope value of machinelearning.predict. For more information, see Authentication API.

    • The USE_CASE and MODEL_ID fields in the /async-prediction for the POST request. The path is /ai/async-prediction/USE_CASE/MODEL_ID. A list of supported modes is returned in the Lucidworks AI Use Case API. For more information about supported models, see Generative AI models.

    Common POST request parameters and fields

    modelConfig

    Some parameters of the /ai/async-prediction/USE_CASE/MODEL_ID POST request are common to all of the generative AI (GenAI) use cases, including the modelConfig parameter. If you do not enter values, the following defaults are used.

    "modelConfig":{
      "temperature": 0.7,
      "topP": 1.0,
      "presencePenalty": 0.0,
      "frequencyPenalty": 0.0,
      "maxTokens": 256
    }

    Also referred to as hyperparameters, these fields set certain controls on the response of a LLM:

    Field Description

    temperature

    A sampling temperature between 0 and 2. A higher sampling temperature such as 0.8, results in more random (creative) output. A lower value such as 0.2 results in more focused (conservative) output. A lower value does not guarantee the model returns the same response for the same input.

    topP

    A floating-point number between 0 and 1 that controls the cumulative probability of the top tokens to consider, known as the randomness of the LLM’s response. This parameter is also referred to as top probability. Set topP to 1 to consider all tokens. A higher value specifies a higher probability threshold and selects tokens whose cumulative probability is greater than the threshold. The higher the value, the more diverse the output.

    presencePenalty

    A floating-point number between -2.0 and 2.0 that penalizes new tokens based on whether they have already appeared in the text. This increases the model’s use of diverse tokens. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens.

    frequencyPenalty

    A floating-point number between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens.

    maxTokens

    The maximum number of tokens to generate per output sequence. The value is different for each model. Review individual model specifications when the value exceeds 2048.

    apiKey

    The optional parameter is only required when the specified model is used for prediction. This secret value is specified in the external model. For:

    • OpenAI models, "apiKey" is the value in the model’s "[OPENAI_API_KEY]" field. For more information, see Authentication API keys.

    • Azure OpenAI models, "apiKey" is the value generated by Azure in either the model’s "[KEY1 or KEY2]" field. For requirements to use Azure models, see Generative AI models.

    • Google VertexAI models, "apiKey" is the value in the model’s

      "[BASE64_ENCODED_GOOGLE_SERVICE_ACCOUNT_KEY]" field. For more information, see Create and delete Google service account keys.

    The parameter (for OpenAI, Azure OpenAI, or Google VertexAI models) is only available for the following use cases:

    • Pass-through

    • RAG

    • Standalone query rewriter

    • Summarization

    • Keyword extraction

    • NER

    azureDeployment

    The optional "azureDeployment": "[DEPLOYMENT_NAME]" parameter is the deployment name of the Azure OpenAI model and is only required when a deployed Azure OpenAI model is used for prediction.

    azureEndpoint

    The optional "azureEndpoint": "[ENDPOINT]" parameter is the URL endpoint of the deployed Azure OpenAI model and is only required when a deployed Azure OpenAI model is used for prediction.

    googleProjectId

    The optional "googleProjectId": "[GOOGLE_PROJECT_ID]" parameter is only required when a Google VertexAI model is used for prediction.

    googleRegion

    The optional "googleRegion": "[GOOGLE_PROJECT_REGION_OF_MODEL_ACCESS]" parameter is only required when a Google VertexAI model is used for prediction. The possible region values are:

    • us-central1

    • us-west4

    • northamerica-northeast1

    • us-east4

    • us-west1

    • asia-northeast3

    • asia-southeast1

    • asia-northeast

    POST response parameters and fields

    The response to the POST /ai/async-prediction/USE_CASE/MODEL_ID requests are as follows:

    Field Description

    predictionId

    The universal unique identifier (UUID) returned in the POST request. This UUID is required in the GET request to retrieve results. For example, fd110486-f168-47c0-a419-1518a4840589.

    status

    The current status of the prediction. Values are:

    • SUBMITTED - The POST request was successful and the response has returned the predictionId and status. The predictionId is used in the GET request.

    • ERROR - An error was generated when the request was sent.

    • READY - The results associated with the predictionId are available and ready to be retrieved.

    Unique values for the external documents RAG use case

    The values available in this use case (that may not be available in other use cases) are:

    Parameter Value

    "documents"

    This array is passed in the batch object. Allowed LLM context length limits the number of documents to 3. The parameter can be used in the query side by clicking the "Include response documents" check box. The array contains the following parameters:

    • "body": <contents of doc>

    • "source": <url/id of doc - used in generating SOURCES cite list>

    • "title": <title>

    • "date": <creation date of the document in epoch time format>

    "useCaseConfig"

    The supported parameters are:

    • "memoryUuid": "string"

      This parameter is optional, and is used when chat history reference information from a previous GenAI interaction is available.

    • extractRelevantContent": boolean

      This parameter can be used in a query and on documents. The default is false.

    The following is an example request. This example does not include modelConfig parameters, but you can submit requests that include parameters described in Common POST request parameters and fields.

    curl --request POST \
      --url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/rag/MODEL_ID \
      --header 'Authorization: Bearer ACCESS_TOKEN' \
      --header 'Content-type: application/json' \
      --data '{
      "batch": [
        {
          "text": "Why did I go to Germany?",
          "documents": [{
            "body": "I'm off to Germany to go to the Oktoberfest!",
            "source": "http://example.com/112",
            "title": "Off to Germany!",
            "date": 1104537600
            }]
          }
        ],
      }'

    The following is an example of a successful response:

    {
    	"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
    	"status": "SUBMITTED"
    }

    The following is an example of an error response:

    {
    	"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
    	"status": "ERROR",
    	"message": "System prompt exceeded the maximum number of allowed input tokens: 81 vs -1091798"
    }

    Example GET request

    curl --request GET
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/PREDICTION_ID
    --header 'Authorization: Bearer Auth '

    The response includes the:

    • Generated answer

    • SOURCES line of text that contains the URL of the documents used to generate the answer

    • Metadata about the response:

      • memoryUuid that can be used to retrieve the LLM’s chat history

      • Count of tokens used to complete the query

    If the LLM cannot determine a response, it returns a response where the first prediction field contains I don’t know.

    The following is an example response:

    {
      "predictionId": "fd110486-f168-47c0-a419-1518a4840589",
      "status": "READY",
      "predictions": [
         {
          "response": "ANSWER: \"I went to Germany to visit family.\"\nSOURCES: [\"http://example.com/112\"]",
          "tokensUsed": {
            "promptTokens": 202,
            "completionTokens": 17,
            "totalTokens": 219
            },
          "answer": "I went to Germany to visit family.",
          "sources": [
              "http://example.com/112"
          ],
          "memoryUuid": "27a887fe-3d7c-4ef0-9597-e2dfc054c20e"
        }
      ]
      }

    Unique values for the chat history RAG use case

    The values available in this use case (that may not be available in other use cases) are:

    Parameter Value

    "documents"

    The array contains the following parameters:

    • "body": <contents of doc>

    • "source": <url/id of doc - used in generating SOURCES cite list>

    • "title": <title>

    • "date": <creation date of the document in epoch time format>

    "useCaseConfig"

    "memoryUuid": "string"

    This parameter is optional, and is used when previous chat history reference information is available.

    When using the RAG search, the LLM service stores the query and its response in a cache. In addition to the response, it also returns a UUID value in the memoryUuid field. If the UUID is passed back in a subsequent request, the LLM uses the cached query and response as part of its context. This lets the LLM be used as a chatbot, where previous queries and responses are used to generate the next response.

    The following is an example request. This example does not include modelConfig parameters, but you can submit requests that include parameters described in Common POST request parameters and fields.

    curl --request POST \
      --url  https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/rag/MODEL_ID \
      --header 'Authorization: Bearer ACCESS_TOKEN' \
      --header 'Content-type: application/json' \
      --data '{
      "batch": [
        {
        "text": "What is RAG?",
        "documents": [{
          "body":"Retrieval Augmented Generation, known as RAG, a framework promising to optimize generative AI and ensure its responses are up-to-date, relevant to the prompt, and most important",
          "source":"http://rag.com/115",
          "title":"What is Retrieval Augmented Generation",
          "date":"1104537600"
          }]
        }
      ],
        "useCaseConfig": {
          "memoryUuid": "27a887fe-3d7c-4ef0-9597-e2dfc054c20e"
        }
        }'

    The following is an example of a successful response:

    {
    	"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
    	"status": "SUBMITTED"
    }

    The following is an example of an error response:

    {
    	"predictionId": "fd110486-f168-47c0-a419-1518a4840589",
    	"status": "ERROR",
    	"message": "System prompt exceeded the maximum number of allowed input tokens: 81 vs -1091798"
    }

    Example GET request

    curl --request GET
    --url https://APPLICATION_ID.applications.lucidworks.com/ai/async-prediction/PREDICTION_ID
    --header 'Authorization: Bearer Auth '

    The following is an example response:

    {
      "predictionId": "fd110486-f168-47c0-a419-1518a4840589",
      "status": "READY",
      "predictions": [
        {
          "response": "ANSWER: \"Retrieval Augmented Generation, known as RAG, is a framework promising to optimize generative AI and ensure its responses are up-to-date, relevant to the prompt, and most important.\"\nSOURCES: [\"http://rag.com/115\"]",
          "tokensUsed": {
           "promptTokens": 238,
           "completionTokens": 54,
           "totalTokens": 292
           },
          "answer": "Retrieval Augmented Generation, known as RAG, is a framework promising to optimize generative AI and ensure its responses are up-to-date, relevant to the prompt, and most important.",
          "sources": [
              "http://rag.com/115"
          ],
          "memoryUuid": "27a887fe-3d7c-4ef0-9597-e2dfc054c20e"
          }
        ]
      }