Exclusion FilterIndex pipeline stage configuration specifications
The Exclusion Filter index stage is used to remove fields or documents that match items in a pre-defined exclusion list.
There are two ways to supply an exclusion list:
-
Upload a file containing a newline-separated list, using the Blob Store. When configuring the index stage, reference the list by its blob name in the
location
property (Exclusion List URI in the Managed Fusion UI). -
When configuring the index stage, enter an array of values for exclusion in the
excludeValues
property (Exclusion List in the Managed Fusion UI).
The Exclusion Filter stage can be configured using one or both of these methods; Managed Fusion combines them into one list. If regexPattern
is configured, the pattern is applied to the field before the result is compared to the combined list.
By default, any matching field is excluded from indexing. To exclude the whole document, set skipDocument
to "true" (Skip Document in the Managed Fusion UI).
Uploading an exclusion list
Before you can configure the location
property, you must upload one or more exclusion lists to Managed Fusion using the Blob Store API.
Managed Fusion comes with an example exclusion list at https://EXAMPLE_COMPANY.lucidworks.cloud/data/nlp/excludes/excludes.txt
. Here is an example of how to upload this file using curl
, where USERNAME:PASSWORD
are the credentials for an admin-level user:
curl -u USERNAME:PASSWORD -X PUT --data-binary @data/nlp/excludes/excludes.txt -H 'Content-type: text/plain' http://EXAMPLE_COMPANY.lucidworks.cloud/api/blobs/excludes.txt
Replace EXAMPLE_COMPANY with the name provided by your Lucidworks representative.
|
Example
Use an exclusion list for entities found in the author field:
{
"type" : "exclusion-filter",
"id" : "iw",
"filters" : [ {
"sourceField" : "author_s",
"location" : "excludes.txt",
"caseSensitive" : false
} ],
"skip" : false
} ]
}
Configuration
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|