Query rewrite jobs post-processing cleanup
The Synonym Detection job uses the output of the Misspelling Detection job and Phrase Extraction job. Therefore, post processing must occur in the order specified in this topic for the Synonym detection job cleanup, Phrase extraction job cleanup, and Misspelling detection job cleanup procedures. The Head-Tail Analysis job cleanup can occur in any order. |
Synonym detection job cleanup
Use this job to remove low confidence synonyms.
Prerequisites
Complete this:
-
AFTER the Misspelling Detection and Phrase Extraction jobs have successfully completed.
-
BEFORE removing low confidence synonym suggestions generated in the post processing phrase extraction cleanup and misspelling detection cleanup procedures detailed later in this topic.
Remove low confidence synonym suggestions
Use either an Synonym cleanup method 1 - API call or the Synonym cleanup method 2 - Managed Fusion Admin UI to remove low confidence synonym suggestions.
Synonym cleanup method 1 - API call
-
Open the
delete_lowConf_synonyms.json
file.{ "type" : "rest-call", "id" : "DC_Large_QR_DELETE_LOW_CONFIDENCE_SYNONYMS", "callParams" : { "uri" : "solr://DC_Large_query_rewrite_staging/update", "method" : "post", "queryParams" : { "wt" : "json" }, "headers" : { }, "entity" : "<root><delete><query>type:synonym AND confidence:[0 TO 0.0005]</query></delete><commit/></root>" }, "type" : "rest-call", "type" : "rest-call" }
REQUEST ENTITY specifies the threshold for low confidence synonyms. Edit the upper range from 0.0005 to increase or decrease the threshold based on your data. -
Enter
<your query_rewrite_staging collection name/update>
in the uri field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Change the
id
field if applicable. -
Specify the upper confidence level in the entity field.
The entity field specifies the threshold for low confidence synonyms. Edit the upper range to increase or decrease the threshold based on your data.
Synonym cleanup method 2 - Managed Fusion Admin UI
-
Log in to Managed Fusion and select Collections > Jobs.
-
Select Add+ > Custom and Other Jobs > REST Call.
-
Enter delete-low-confidence-synonyms in the ID field.
-
Enter
<your query_rewrite_staging collection name/update>
in the ENDPOINT URI field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Enter POST in the CALL METHOD field.
-
In the QUERY PARAMETERS section, select + to add a property.
-
Enter wt in the Property Name field.
-
Enter json in the Property Value field.
-
In the REQUEST PROTOCOL HEADERS section, select + to add a property.
-
Enter the following as a REQUEST ENTITY (AS STRING)
<root><delete><query>type:synonym AND confidence: [0 TO 0.0005]</query></delete><commit/></root>
REQUEST ENTITY specifies the threshold for low confidence synonyms. Edit the upper range from 0.0005 to increase or decrease the threshold based on your data.
Delete all synonym suggestions
To delete all of the synonym suggestions, enter the following in the REQUEST ENTITY section:
<root><delete><query>type:synonym</query></delete><commit/></root>
This entry may be helpful when tuning the synonym detection job and testing different configuration parameters. |
Phrase extraction job cleanup
Use this job to remove low confidence phrase suggestions.
Remove low confidence phrase suggestions
Use either an Phrase cleanup method 1 - API call or the Phrase cleanup method 2 - Managed Fusion Admin UI to remove low confidence phrase suggestions.
Phrase cleanup method 1 - API call
-
Open the
delete_lowConf_phrases.json
file.
{ "type" : "rest-call", "id" : "DC_Large_QR_DELETE_LOW_CONFIDENCE_PHRASES", "callParams" : { "uri" : "solr://DC_Large_query_rewrite_staging/update", "method" : "post", "queryParams" : { "wt" : "json" }, "headers" : { }, "entity" : " <root><delete><query>type:phrase AND confidence:[0 TO <INSERT VALUE HERE>]</query></delete><commit/></root>" }, "type" : "rest-call", "type" : "rest-call" }
-
Enter
<your query_rewrite_staging collection name/update>
in the uri field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Change the id field if applicable.
-
Specify the upper confidence level in the entity field.
The entity field specifies the threshold for low confidence phrases. Edit the upper range to increase or decrease the threshold based on your data.
Phrase cleanup method 2 - Managed Fusion Admin UI
-
Log in to Managed Fusion and select Collections > Jobs.
-
Select Add+ > Custom and Other Jobs > REST Call.
-
Enter remove-low-confidence-phrases in the ID field.
-
Enter
<your query_rewrite_staging collection name/update>
in the ENDPOINT URI field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Enter POST in the CALL METHOD field.
-
In the QUERY PARAMETERS section, select + to add a property.
-
Enter wt in the Property Name field.
-
Enter json in the Property Value field.
-
In the REQUEST PROTOCOL HEADERS section, select + to add a property.
-
Enter the following as a REQUEST ENTITY (AS STRING)
<root><delete><query>type:phrase AND confidence: [0 TO <insert value>]</query></delete><commit/></root>
REQUEST ENTITY specifies the threshold for low confidence phrases. Edit the upper range to increase or decrease the threshold based on your data.
Delete all phrase suggestions
To delete all of the phrase suggestions, enter the following in the REQUEST ENTITY section:
<root><delete><query>type:phrase</query></delete><commit/></root>
This entry may be helpful when tuning the phrase extraction job and testing different configuration parameters. |
Misspelling detection job cleanup
Use this job to remove low confidence spellings (also referred to as misspellings).
Prerequisites
Complete this:
-
AFTER you complete Synonym detection job cleanup and Phrase extraction job cleanup
Remove misspelling suggestions
Use either an Misspelling cleanup method 1 - API call or the Misspelling cleanup method 2 - Managed Fusion Admin UI to remove misspelling suggestions.
Misspelling cleanup method 1 - API call
-
Open the
delete_lowConf_misspellings.json
file.
{ "type" : "rest-call", "id" : "DC_Large_QR_DELETE_LOW_CONFIDENCE_MISSPELLINGS", "callParams" : { "uri" : "solr://DC_Large_query_rewrite_staging", "method" : "post", "queryParams" : { "wt" : "json" }, "headers" : { }, "entity" : "<root><delete><query>type:spell AND confidence:[0 TO 0.5]</query></delete><commit/></root>" }, "type" : "rest-call", "type" : "rest-call" }
-
Enter
<your query_rewrite_staging collection name/update>
in the uri field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Change the id field if applicable.
-
Specify the upper confidence level in the entity field.
The entity field specifies the threshold for low confidence spellings. Edit the upper range to increase or decrease the threshold based on your data.
Misspelling cleanup method 2 - Managed Fusion Admin UI
-
Log in to Managed Fusion and select Collections > Jobs.
-
Select Add+ > Custom and Other Jobs > REST Call.
-
Enter remove-low-confidence-spellings in the ID field.
-
Enter
<your query_rewrite_staging collection name/update>
in the ENDPOINT URI field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Enter POST in the CALL METHOD field.
-
In the QUERY PARAMETERS section, select + to add a property.
-
Enter wt in the Property Name field.
-
Enter json in the Property Value field.
-
In the REQUEST PROTOCOL HEADERS section, select + to add a property.
-
Enter the following as a REQUEST ENTITY (AS STRING)
<root><delete><query>type:spell AND confidence: [0 TO 0.5]</query></delete><commit/></root>
REQUEST ENTITY specifies the threshold for low confidence spellings. Edit the upper range from 0.5 to increase or decrease the threshold based on your data.
Delete all misspelling suggestions
To delete all of the misspelling suggestions, enter the following in the REQUEST ENTITY section:
<root><delete><query>type:spell</query></delete><commit/></root>
This entry may be helpful when tuning the misspelling detection job and testing different configuration parameters. |
Head-tail analysis job cleanup
The head-tail analysis job puts tail queries into one of multiple reason categories. For example, a tail query that includes a number might be assigned to the 'numbers' reason category. If the output in a particular category is not useful, you can remove it from the results. The examples in this section remove the numbers category.
Prerequisites
The head-tail analysis job cleanup does not have to occur in a specific order.
Remove head-tail analysis query suggestions
Use either an Head-tail analysis cleanup method 1 - API call or the Head-tail analysis cleanup method 2 - Managed Fusion Admin UI to remove query category suggestions.
Head-tail analysis cleanup method 1 - API call
-
Open the
delete_lowConf_headTail.json
file.
{ "type" : "rest-call", "id" : "DC_Large_QR_HEAD_TAIL_CLEANUP", "callParams" : { "uri" : "solr://DC_Large_query_rewrite_staging/update", "method" : "post", "queryParams" : { "wt" : "json" }, "headers" : { }, "entity" : "<root><delete><query>reason_code_s:(\"number\" \"number spelling\" \"number rare-term\" \"question number other-specific\" \"number others\" \"number other-specific\" \"number other-extra\" \"product number other-specific\" \"product number other-extra\" \"product number spelling\" \"product number others\" \"product number rare-term\" \"product question number\" \"product number re-wording\" \"question number other-extra\" \"number re-wording\")</query></delete><commit/></root>" }, "type" : "rest-call", "type" : "rest-call" }
-
Enter
<your query_rewrite_staging collection name/update>
in the uri field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Change the id field if applicable.
Head-tail analysis cleanup method 2 - Managed Fusion Admin UI
-
Log in to Managed Fusion and select Collections > Jobs.
-
Select Add+ > Custom and Other Jobs > REST Call.
-
Enter remove-low-confidence-head-tail in the ID field.
-
Enter
<your query_rewrite_staging collection name/update>
in the ENDPOINT URI field. An example URI value for an app calledDC_Large
would beDC_Large_query_rewrite_staging/update
. -
Enter POST in the CALL METHOD field.
-
In the QUERY PARAMETERS section, select + to add a property.
-
Enter wt in the Property Name field.
-
Enter json in the Property Value field.
-
In the REQUEST PROTOCOL HEADERS section, select + to add a property.
-
Enter the following as a REQUEST ENTITY (AS STRING)
<root><delete><query>reason_code_s:("number" "number spelling" "number rare-term" "question number other-specific" "number others" "number other-specific" "number other-extra" "product number other-specific" "product number other-extra" "product number spelling" "product number others" "product number rare-term" "product question number" "product number re-wording" "question number other-extra" "number re-wording")</query></delete><commit/></root>
Delete all head-tail suggestions
To delete all of the head-tail suggestions, enter the following in the REQUEST ENTITY section:
<root><delete><query>type:tail</query></delete><commit/></root>
This entry may be helpful when tuning the head-tail job and testing different configuration parameters. |