Keyword extraction use caseLucidworks AI Prediction API
In the Keyword extraction use case of the LWAI Prediction API, the LLM ingests text and returns a JSON response that contains a list of keywords extracted from the text. No options can be configured.
This use case can be used:
-
To extract keywords to create new facets
-
To extract keywords from each document:
-
To write the keyword into a new field that subsequently searches large content documents more efficiently
-
To boost specific keyword content
-
For document clustering
-
Prerequisites
To use this API, you need:
-
The unique
APPLICATION_ID
for your Lucidworks AI application. For more information, see credentials to use APIs. -
A bearer token generated with a scope value of
machinelearning.predict
. For more information, see Authentication API. -
The
USE_CASE
andMODEL_ID
fields for the use case request. The path is:/ai/prediction/USE_CASE/MODEL_ID
. A list of supported models is returned in the Lucidworks AI Use Case API. For more information about supported models, see Generative AI models.
Common parameters and fields
modelConfig
Some parameters of the /ai/prediction/USE_CASE/MODEL_ID
request are common to all of the generative AI (GenAI) use cases, including the modelConfig
parameter. If you do not enter values, the following defaults are used.
"modelConfig":{
"temperature": 0.7,
"topP": 1.0,
"presencePenalty": 0.0,
"frequencyPenalty": 0.0,
"maxTokens": 256
}
Also referred to as hyperparameters, these fields set certain controls on the response of a LLM:
Field | Description |
---|---|
temperature |
A sampling temperature between 0 and 2. A higher sampling temperature such as 0.8, results in more random (creative) output. A lower value such as 0.2 results in more focused (conservative) output. A lower value does not guarantee the model returns the same response for the same input. |
topP |
A floating-point number between 0 and 1 that controls the cumulative probability of the top tokens to consider, known as the randomness of the LLM’s response. This parameter is also referred to as top probability. Set |
presencePenalty |
A floating-point number between -2.0 and 2.0 that penalizes new tokens based on whether they have already appeared in the text. This increases the model’s use of diverse tokens. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. |
frequencyPenalty |
A floating-point number between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text. A value greater than zero (0) encourages the model to use new tokens. A value less than zero (0) encourages the model to repeat existing tokens. |
maxTokens |
The maximum number of tokens to generate per output sequence. The value is different for each model. Review individual model specifications when the value exceeds 2048. |
apiKey |
The optional parameter is only required when the specified model is used for prediction. This secret value is specified in the external model. For:
The parameter (for OpenAI, Azure OpenAI, or Google VertexAI models) is only available for the following use cases:
|
azureDeployment |
The optional |
azureEndpoint |
The optional |
googleProjectId |
The optional |
googleRegion |
The optional
|
Unique values for the keyword extraction use case
The values available in this use case (that may not be available in other use cases) are:
Parameter | Value |
---|---|
"text" |
Free-form content contained in the document. |
"useCaseConfig" |
"maxKeywords": integer This parameter specifies the maximum number of keywords that can be extracted, even though the model may contain more than the maximum number specified in this field. |
Example request
This example does not include modelConfig
parameters, but you can submit requests that include parameters described in Common parameters and fields.
curl --request POST \
--url https://APPLICATION_ID.applications.lucidworks.com/ai/prediction/keyword_extraction/MODEL_ID \
--header 'Authorization: Bearer ACCESS_TOKEN' \
--header 'Content-type: application/json' \
--data '{
"batch": [
{
"text": "Joseph Robinette Biden Jr.is an American politician who is the 46th and current president of the United States. Ideologically a moderate member of the Democratic Party, he previously served as the 47th vice president from 2009 to 2017 under President Barack Obama and represented Delaware in the United States Senate from 1973 to 2009.Born in Scranton, Pennsylvania, Biden moved with his family to Delaware in 1953. He studied at the University of Delaware before earning his law degree from Syracuse University. He was elected to the New Castle County Council in 1970 and to the U.S. Senate in 1972. As a senator, Biden drafted and led the effort to pass the Violent Crime Control and Law Enforcement Act and the Violence Against Women Act. He also oversaw six U.S. Supreme Court confirmation hearings, including the contentious hearings for Robert Bork and Clarence Thomas. Biden ran unsuccessfully for the Democratic presidential nomination in 1988 and 2008. In 2008, Obama chose Biden as his running mate, and Biden was a close counselor to Obama during his two terms as vice president. In the 2020 presidential election, Biden and his running mate, Kamala Harris, defeated incumbents Donald Trump and Mike Pence. Biden is the second Catholic president in U.S. history (after John F. Kennedy), and his politics have been widely described as profoundly influenced by Catholic social teaching."
}
],
"useCaseConfig": {
"maxKeywords": 100
}
}'
The following is an example response:
{
"predictions": [
{
"response": "Joseph Robinette Biden Jr., 46th president of the United States, Democratic Party, Vice President, Barack Obama, Delaware, University of Delaware, Syracuse University, Violent Crime Control and Law Enforcement Act, Violence Against Women Act.",
"tokensUsed": {
"promptTokens": 148,
"completionTokens": 27,
"totalTokens": 175
},
"keywords": [
"Joseph Robinette Biden Jr.",
"46th president of the United States",
"Democratic Party",
"Vice President",
"Barack Obama",
"Delaware",
"University of Delaware",
"Syracuse University",
"Violent Crime Control and Law Enforcement Act",
"Violence Against Women Act"
]
}
]
}