Chunking

Table of Contents

How chunking works
Chunking in Neural Hybrid Search

Chunking breaks down large documents into manageable pieces. Each chunk generates its own vector, resulting in multiple vectors that collectively represent a field from a single parent document. This is useful for large documents that exceed maximum vector dimensions.

Chunking helps search because smaller pieces of text are easier to match accurately with a user’s query. When the AI searches through chunks instead of whole documents, it avoids irrelevant content and focuses only on the most relevant parts. This reduces noise, improves precision, and increases the chances of finding the exact answer or context the user needs.

Chunking helps retrieval augmented generated (RAG) by giving the system smaller, focused pieces of information to choose from when answering a question. Instead of pulling in a whole document, RAG can retrieve just the chunks that are most relevant. This makes the answer more accurate because the model is only looking at the parts that actually match the question. It also reduces the chance of including unrelated or confusing content in the final response.

This feature is only available in Fusion 5.9.x for versions 5.9.12 and later.

How chunking works

Chunking works by limiting the context length to 512 tokens. Ideally, a chunk represents a complete thought or idea and is usually a sentence or two in length. Chunking should also balance computational efficiency. For example, you should be careful to not generate too many chunks per document, because each chunk is represented by a vector of O(1000) floats, which can affect performance and resource usage.

There are limits to both the request and response payloads sent to the LWAI Chunker from Fusion. Currently Fusion truncates the body of text sent to Lucidworks AI for chunking to 50,000 characters (O(100) pages).

Chunking in Neural Hybrid Search

You can set up Neural Hybrid Search to index, rank, and retrieve documents based on a combination of lexical and chunked vectors.

To use chunking in Neural Hybrid Search, you must use Lucidworks AI. Chunking is not supported for Ray or Seldon implementations. For more information, see the Lucidworks AI Async Chunking API.

In order to set up the Lucidworks AI index and query stages, you need to first set up your Lucidworks AI Gateway integration. This guide also assumes that you’ve set up and configured a datasource.

Setting up chunking in Neural Hybrid Search is similar to a standard Neural Hybrid search implementation, except that the LWAI Chunker Stage replaces the LWAI Vectorize Field stage and the Chunking Neural Hybrid Query stage replaces the Neural Hybrid Query or Hybrid Query stages.

Click Get Started below to see how to enable chunking in Fusion:

Set up LWAI Chunker index pipeline stage

Sign into Fusion, go to Indexing > Index Pipelines, then select an existing pipeline or create a new one.
Click Add a new pipeline stage, then select LWAI Chunker Stage. For reference information, see LWAI Chunker Index Stage.
In the Account Name field, select the Lucidworks AI API account name defined in Lucidworks AI Gateway.
In the Chunking Strategy field, select the strategy to use. For example, sentence.
In the Model for Vectorization field, select the Lucidworks AI model to use for encoding. For more information, see:
- Pre-trained embedding models
- Custom embedding model training. To use a custom model, you must obtain the deployment ID from the deployments screen, or from the Lucidworks AI Models API and enter that in the Model field.
In the Input context variable field, enter the variable in context to be used as input. This field supports template expressions.
In the Source field, enter the name of the string field where the value should be submitted to the model for encoding. If the field is blank or does not exist, this stage is not processed. Template expressions are supported.

In the Destination Field Name & Context Output field, enter the name of the field where the vector value from the model response is saved.

This field must contain chunk_vector and must be a dense vector field type. This field is used to populate two things with the prediction results:

The field name in the document that will contain the prediction.
The name of the context variable that will contain the prediction.

In the Destination Field Name for Text Chunks (not the vectors) field, enter the field name that will contain the text chunks that are generated by the chunker. For example, body_chunks_ss.
In the Chunker Configuration section, click the + sign to enter the parameter name and value for additional chunker keys to send to Lucidworks AI. For example, to limit the chunk size to two sentences, enter chunkSize and 2, respectively.
In the Model Configuration section, click the + sign to enter the parameter name and value for additional model configurations to send to Lucidworks AI. Several modelConfig parameters are common to generative AI use cases.
In the API Key field, enter the secret associated with the model. For example, for OpenAI models, the value would start with sk-.
In the Maximum Asynchronous Call Tries field, enter the maximum number of attempts to issue an asynchronous Lucidworks AI API call. The default value is 3.
Select the Fail on Error checkbox to generate an exception if an error occurs while generating a prediction for a document.
Click Save.

Set up Solr Partial Update Indexer stage

Fusion’s asynchronous chunking process is optimized for efficiency and reliability. To achieve this, it leverages the Solr Partial Update Indexer stage and a single index pipeline visited twice.

In the same pipeline, click Add a new pipeline stage, then select Solr Partial Update Indexer.
Select the checkboxes to disable Map to Solr Schema, Enable Concurrency Control, and Reject Update if Solr Document is not Present.
Select the checkbox to enable Process All Pipeline Doc Fields.
Select the checkbox to enable Allow reserved fields.
Click Save.

Index data using the new pipeline.

Set up LWAI Vectorize Query stage

Go to Querying > Query Pipelines, then select an existing pipeline or create a new one.
To vectorize text, click Add a new pipeline stage.
Click Add a new pipeline stage, then select LWAI Vectorize Query.
In the Account Name field, select the name of the Lucidworks AI account.
In the Model field, select the Lucidworks AI model to use for encoding.
In the Query Input field, enter the location from which the query is retrieved.
In the Output context variable field, enter the name of the variable where the vector value from the response is saved.
In the Use Case Configuration section, click the + sign to enter the parameter name and value to send to Lucidworks AI. The useCaseConfig parameter that is common to generative AI and embedding use cases is dataType, but each use case may have other parameters. The value for the query stage is query.
In the Model Configuration section, click the + sign to enter the parameter name and value to send to Lucidworks AI. Several modelConfig parameters are common to generative AI use cases. For more information, see Prediction API.
Select the Fail on Error checkbox to generate an exception if an error occurs during this stage.
Click Save.

Set up Chunking Neural Hybrid Query pipeline stage

In the same query pipeline where you configured LWAI Vectorize Query stage, click Add a new pipeline stage, then select Chunking Neural Hybrid Query Stage. For reference information, see Chunking Neural Hybrid Query Stage.
In the Lexical Query Input field, enter the location from which the lexical query is retrieved. For example, <request.params.q>. Template expressions are supported.
In the Lexical Query Weight field, enter the relative weight of the lexical query. For example, 0.3. If this value is 0, no re-ranking will be applied using the lexical query scores.
In the Lexical Query Squash Factor field, enter a value that will be used to squash the lexical query score. For this value, Lucidworks recommends entering the inverse of the lexical maximum score across all queries for the given collection.
In the Vector Query Field, enter the name of the Solr field for k-nearest neighbor (KNN) vector search. For example, body_chunk_vector_384v.
In the Vector Input field, enter the location from which the vector is retrieved. Template expressions are supported. For example, a value of <ctx.vector> evaluates the context variable resulting from the LWAI Vectorize Query stage.
In the Vector Query Weight field, enter the relative weight of the vector query. For example, 0.7.
In the Min Return Vector Similarity field, enter the minimum vector similarity value to qualify as a match from the Vector portion of the hybrid query.
In the Min Traversal Vector Similarity field, enter the minimum vector similarity value to use when walking through the graph during the Vector portion of the hybrid query.
Select the checkbox to enable the Compute Vector Similarity for Lexical-Only Matches setting. When enabled, this setting computes vector similarity scores for documents in lexical search results but not in the initial vector search results.
Select the checkbox to enable the Block pre-filtering setting. When enabled, this setting prevents pre-filtering that can interfere with facets and cause other issues.
Click Save.

Validate chunking in the Query Workbench

Once configured, go to the Query Workbench to run some queries and check that vectorization and chunking are working properly.

If you facet by the vector query field (in this example, body_chunk_vector_384v) you see your indexed documents have vectors.

fusion chunking vectors

If you have a large dataset with thousands of docs, you should set this field to stored=false. Storing vectors in Solr for that many docs can results in memory issues. Refer to the Solr documentation on override properties for more information.

If you facet by _lw_chunk_root, you see body_chunk_ss. In this example, the chunk size is limited to two sentences, so this document has 29 chunks of two sentences each.

fusion chunking response