Chunking Neural Hybrid Query Stage

Fusion 5.9.12 and later releases use index and query stages to split large documents into smaller, more manageable segments called chunks. For more information about chunking, chunking strategies and setting up chunking, see Chunking. The Chunking Neural Hybrid Query stage performs hybrid lexical-semantic search that combines BM25-type lexical search with KNN dense vector search via Solr. This stage differs from the Neural Hybrid Stage because it supports chunking. Not sure which hybrid query stage is right for you? Read about the differences between the hybrid query stages.

This feature is only available in Fusion 5.9.x for versions 5.9.12 and later.Some prefiltering capabilities, such as access to the preFilterKey context property and the VectorPreFilter helper class are only available in 5.9.13 and later.

Click Get Started below to see how to enable chunking in Fusion:

About the Lexical Query Squash Factor

The Lexical Query Squash Factor field lets you input a value that squashes the lexical query scores from 0..inf to 0..1. This setting helps prevent the lexical query from dominating the final score, and normalizes the score into a range that works well with vector similarity scores. Additionally, it helps prevent the vanishing gradient problem, which occurs when very high lexical scores are mapped to values extremely close to 1, such as 0.99999999. During the hybrid search calculation, these near-1 values can cause the system to lose sensitivity to subtle differences in lexical relevance, effectively ‘squashing’ the gradient and reducing the impact of lexical scoring. Lucidworks recommends setting the Lexical Query Squash Factor to the inverse of the maximum lexical score observed across your queries. This helps balance the impact of lexical and vector scores, leading to more accurate and nuanced search results.

Prefiltering

Prefiltering is a technique that can improve performance and accuracy by filtering documents before applying the algorithm, reducing the number of documents that need to be processed. This is especially effective with the KNN algorithm. Prefiltering is disabled by default. To enable it, uncheck Block pre-filtering in this stage. When prefiltering is enabled, you can configure the filters using one or both of these methods:

Security filters
You can use security filters as prefilters by placing the Graph Security Trimming Stage after this one in the pipeline.
Then Fusion uses the security trimming filter as a prefilter.

JavaScript
When prefiltering is enabled, this stage adds a preFilterKey object to the Javascript ctx object.
You can place a Javascript stage after this one and use it to access the preFilterKey object, as in this example:

if(ctx.hasProperty("preFilterKey")) {
  var preFilter = ctx.getProperty("preFilterKey");
  preFilter.addFilter(filterQuery)
}

You can also use the following example in Fusion 5.9.13 and later for placing pre-filter specific filters:

var QueryRequestAndResponse = Java.type('com.lucidworks.apollo.pipeline.query.QueryRequestAndResponse');
var VectorPreFilter = Java.type("com.lucidworks.apollo.pipeline.query.stages.VectorPreFilter");
var preFilter = ctx.get(VectorPreFilter.CONTEXT_KEY);
if(preFilter){
  var wrapper = QueryRequestAndResponse.create(request,response,0)
    preFilter.addFilter(wrapper, 'id:* OR foo_s:bar');
}

The context object of the VectorPreFilter class is not present in Fusion 5.9.12 and earlier. Use the Additional Query Parameters stage example and the 5.9.12 note below.

In Fusion 5.9.12 and earlier, a non-JavaScript approach is required in addition to the Additional Query Parameters stage example in the next section. In Fusion 5.9.12 the parameter vec_sim_q after the Chunking Neural Hybrid Query stage needs to be altered to include {!knn f=$vec_field v=$vec_q topK=100 preFilter=$vectorPreFilter} where topK is consistent with what your stage sets.

if (request.hasParam("vec_sim_q")) {
    var vecSimQ = request.getFirstFieldValue("vec_sim_q");
    if (vecSimQ.indexOf("preFilter") < 0) {
      vecSimQ = vecSimQ.substring(0,vecSimQ.lastIndexOf("}"))  + " preFilter=$vectorPreFilter }";
      request.removeParam("vec_sim_q")
      request.addParam("vec_sim_q",vecSimQ);
    }
  }

Additional Query Parameters stage If you do not want to create a JavaScript stage, you can create additional query parameters to prefilter the documents to be processed by using what the previous JavaScript example adds to the request. This step is required for Fusion 5.9.12. The following example uses a single prefilter:
```
"fq" = "{!bool filter=$vectorPreFilter}"
"vectorPreFilter" = "EXAMPLE_FILTER"
```
The following example uses multiple prefilters:
```
"fq": "{!bool filter=$filterClauses}",
"vectorPreFilter": "{!bool should=$filterClauses}",
"filterClauses": ["id:EXAMPLE_FILTER1","id:EXAMPLE_FILTER2"]
```

Query pipeline stage condition examples

Stages can be triggered conditionally when a script in the Condition field evaluates to true. Some examples are shown below. Run this stage only for mobile clients:

params.deviceType === "mobile"

Run this stage when debugging is enabled:

params.debug === "true"

Run this stage when the query includes a specific term:

params.q && params.q.includes("sale")

Run this stage when multiple conditions are met:

request.hasParam("fusion-user-name") && request.getFirstParam("fusion-user-name").equals("SuperUser");
!request.hasParam("isFusionPluginQuery")

The first condition checks that the request parameter “fusion-user-name” is present and has the value “SuperUser”. The second condition checks that the request parameter “isFusionPluginQuery” is not present.

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

Introduction to Fusion

Getting Data In

Getting Data Out

Operations

Reference

Developer Docs

Neural Hybrid Search

About the Lexical Query Squash Factor

Prefiltering

Query pipeline stage condition examples

Configuration

Introduction to Fusion

Getting Data In

Getting Data Out

Operations

Reference

Developer Docs

Neural Hybrid Search

Documentation Index

​About the Lexical Query Squash Factor

​Prefiltering

​Query pipeline stage condition examples

​Configuration

About the Lexical Query Squash Factor

Prefiltering

Query pipeline stage condition examples

Configuration