import requests
url = "https://application_id.applications.lucidworks.com/ai/async-chunking/semantic/{MODEL_ID}"
payload = {
"batch": [{ "text": "The content to be split into chunks. " }],
"modelConfig": {
"vectorQuantizationMethod": "min-max",
"dimReductionSize": 256
},
"useCaseConfig": { "dataType": "query" },
"chunkerConfig": {
"maxChunkSize": 512,
"overlapSize": 1,
"cosineThreshold": 0.567,
"approximate": True
}
}
headers = {
"Authorization": "<authorization>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text){
"chunkingId": "441eb3be-7de6-470a-8141-e416a15c7db1",
"status": "SUBMITTED"
}Chunk using semantic chunker
The semantic chunker (chunking strategy) creates chunks based on semantic similarity.
Using the model defined in the URL request, the semantic chunker splits text into sentences, encodes the sentences, and then compares the sentence to the building chunk to determine if they are similar enough to group together.
After merging two semantically-similar sentences into a pre-chunk, the semantic chunker needs to encode it to get its vector to compare with the next sentence vector.
This chunker is the slowest of all of the chunkers even if you set the approximate field to true.
import requests
url = "https://application_id.applications.lucidworks.com/ai/async-chunking/semantic/{MODEL_ID}"
payload = {
"batch": [{ "text": "The content to be split into chunks. " }],
"modelConfig": {
"vectorQuantizationMethod": "min-max",
"dimReductionSize": 256
},
"useCaseConfig": { "dataType": "query" },
"chunkerConfig": {
"maxChunkSize": 512,
"overlapSize": 1,
"cosineThreshold": 0.567,
"approximate": True
}
}
headers = {
"Authorization": "<authorization>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text){
"chunkingId": "441eb3be-7de6-470a-8141-e416a15c7db1",
"status": "SUBMITTED"
}Headers
Bearer token used for authentication. Format: Authorization: Bearer ACCESS_TOKEN.
application/json
"application/json"
Path Parameters
Unique identifier for the model.
"gte-small"
Body
The batch of key:value pairs used in the chunking request.
Show child attributes
Show child attributes
Provides fields and values that specify ranges for tokens.
Show child attributes
Show child attributes
Show child attributes
Show child attributes
The semantic chunker (chunking strategy) creates chunks based on semantic similarity.
Using the model defined in the URL request, the semantic chunker splits text into sentences, encodes the sentences, and then compares the sentence to the building chunk to determine if they are similar enough to group together.
After merging two semantically-similar sentences into a pre-chunk, the semantic chunker needs to encode it to get its vector to compare with the next sentence vector.
This chunker is the slowest of all of the chunkers even if you set the approximate field to true.
This is the default chunker configuration if nothing is passed.
Show child attributes
Show child attributes
Response
OK
This is the response to the POST chunking request submitted for a specific chunker and modelId.
The universal unique identifier (UUID) returned in the POST request. This UUID is required in the GET request to retrieve results.
"441eb3be-7de6-470a-8141-e416a15c7db1"
The current status of the request. Allowed values are:
-
SUBMITTED - The POST request was successful and the response has returned the
chunkingIdandstatusthat is used by the GET request. -
ERROR - An error was generated when the GET request was sent.
-
READY - The results associated with the
chunkingIdare available and ready to be retrieved. -
RETRIEVED - The results associated with the
chunkingIdare returned successfully when the GET request was sent.
"SUBMITTED"
Was this page helpful?