| Default job name | COLLECTION_NAME_spell_correction |
| Input | Raw signals (the COLLECTION_NAME_signals collection by default) |
| Output | Synonyms (the COLLECTION_NAME_query_rewrite_staging collection by default) |
| query | count_i | type | timstamp_tdt | user_id | doc_id | session_id | fusion_query_id | |
|---|---|---|---|---|---|---|---|---|
| Required signals fields: | ✅ | ✅ | ✅ |
trainingCollection parameter that can contain signal data or non-signal data. For signal data, select Input is Signal Data (signalDataIndicator). Signals can be raw (from the _signals collection) aggregated (from the _signals_aggr collection).fieldToVectorize parameter.count_i is the field that records the count of raw signals and aggr_count_i is the field that records the count after aggregation.mainType parameter)filterType parameter)click with a minimum count of 0 and the filtering event type to be query with a minimum count of 20, then the job:
dictionaryCollection) and Dictionary Field (dictionaryField) parameters. For example, in an e-commerce use case, you can use the catalog terms as the custom dictionary by specifying the product catalog collection as the dictionary collection and the product description field as the dictionary field.

query_rewrite_staging
collection by default; you can change this by setting the outputCollection.
An example record is as follows:

suggested_corrections field, which provides suggestions about using token correction or whole-phrase correction. If the confidence of the correction is not high, then the job labels the pair as “review” in this field. Pay special attention to the output records with the “review” labels.
With the output in a CSV file, you can sort by mis_string_len (descending) and edit_dist (ascending) to position more probable corrections at the top. You can also sort by the ratio of correction traffic over misspelling traffic (the corCount_misCount_ratio field) to only keep high-traffic boosting corrections.
For phrase misspellings, the misspelled tokens are separated out and put in the token_wise_correction field. If the associated token correction is already included in the one-word correction list, then the collation_check field is labeled as “token correction include.” You can choose to drop those phrase misspellings to reduce duplications.
Fusion counts how many phrase corrections can be solved by the same token correction and puts the number into the token_corr_for_phrase_cnt field. For example, if both “outdoor surveillance” and “surveillance camera” can be solved by correcting “surveillance” to “surveillance”, then this number is 2, which provides some confidence for dropping such phrase corrections and further confirms that correcting “surveillance” to “surveillance” is legitimate.
You might also see cases where the token-wise correction is not included in the list. For example, “xbow” to “xbox” is not included in the list because it can be dangerous to allow an edit distance of 1 in a word of length 4. But if multiple phrase corrections can be made by changing this token, then you can add this token correction to the list.
token_corr_for_phrase_cnt and with collation_check labeled as “token correction not included” could be potentially-problematic corrections.correction_types field. If there is a user-provided dictionary to check against, and both spellings are in the dictionary with and without whitespace in the middle, we can treat these pairs as bi-directional synonyms (“combine/break words (bi-direction)” in the correction_types field).
The sound_match and lastChar_match fields also provide useful information.
trainingDataFilterQuery/Data filter query |
See Event types above, then adjust this value to reflect the secondary event for your search application. To query all data, set this to *:*. |
minCountFilter/Minimum Filtering Event Count |
| Lower this value to include less-frequent misspellings based on the data filter query. |
maxDistance/Maximum Edit Distance |
| Raise this value to increase the number of potentially-related tokens and phrases detected. |
minMispellingLen/Minimum Length of Misspelling |
| Lower this value to include shorter misspellings (which are harder to correct accurately). |
Query Rewrite Jobs Post-processing Cleanup
delete_lowConf_synonyms.json file.
<your query_rewrite_staging collection name/update> in the uri field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.
id field if applicable.
<your query_rewrite_staging collection name/update> in the ENDPOINT URI field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.
<root><delete><query>type:synonym AND confidence: [0 TO 0.0005]</query></delete><commit/></root>
<root><delete><query>type:synonym</query></delete><commit/></root>delete_lowConf_phrases.json file.
<your query_rewrite_staging collection name/update> in the uri field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.
<your query_rewrite_staging collection name/update> in the ENDPOINT URI field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.
<root><delete><query>type:phrase AND confidence: [0 TO <insert value>]</query></delete><commit/></root>
<root><delete><query>type:phrase</query></delete><commit/></root>delete_lowConf_misspellings.json file.
<your query_rewrite_staging collection name/update> in the uri field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.
<your query_rewrite_staging collection name/update> in the ENDPOINT URI field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.
<root><delete><query>type:spell AND confidence: [0 TO 0.5]</query></delete><commit/></root>
<root><delete><query>type:spell</query></delete><commit/></root>delete_lowConf_headTail.json file.
<your query_rewrite_staging collection name/update> in the uri field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.<your query_rewrite_staging collection name/update> in the ENDPOINT URI field. An example URI value for an app called DC_Large would be DC_Large_query_rewrite_staging/update.<root><delete><query>type:tail</query></delete><commit/></root>