Use Language Features
- Default language
- Supported languages
- Language features for supported languages
- Specifying and detecting languages
- Example of localizing a search app to use two languages
Site Search lets you develop search apps that use specific language features for one or more supported languages.
Default language
The default language is English. If your goal is an English language app, then you can develop a Site Search app, use embedded modules or APIs to develop a search app, and not give language a second thought.
In a bit more detail, these apply to the English language case:
-
Document content language – The contents of documents are in English (at least mostly).
-
Language specification – It is possible though not necessary to specify that document contents are in English.
-
Language detection – Language detection happens, detecting English.
-
Boosting – Documents in English are boosted toward the top of results relative to documents in other languages. If there are no documents in other languages, this doesn’t do anything.
-
UI language of embeddable modules – Language strings in embeddable modules are in English.
Supported languages
You can localize a specific Site Search app for searches in any of these languages:
-
English (
en
), the default -
Dutch (
nl
) -
German (
de
) -
Spanish (
es
) -
French (
fr
) -
Italian (
it
) -
Polish (
pl
) -
Portuguese (
pt
) -
Russian (
ru
) -
Swedish (
sv
)
Site Search supports one-language-per-search-app development, as clarified in the sections that follow.
A single Site Search app can be the foundation for a search app or apps in one or more of the supported languages:
-
Each embedded Search Box module can specify one UI language (or use the default, English) and one search language (or use the default search language set in the configuration of the Search Box).
-
Each API call specifies a single search language, or relies upon the default search language for the Site Search app.
Note
|
It isn’t currently possible to localize the Lucidworks Cloud dashboard or the Admin UI for Site Search apps in a language other than English. |
Language features for supported languages
Site Search has these language features.
-
Index language-specific parts of websites – Although not a language feature per se, in some cases you can separate documents by language by creating different data sources for different parts of websites. For example, one Web Crawler data source in a Site Search app might index
https://my.company.com/products/en
for the English pages and another data source would indexhttps://my.company.com/products/fr
for the French pages.Documents from different data sources are available on different tabs in a Topic Tabs module.
-
Specify and detect the content languages of documents – Site search uses a combination of language specification and language detection to determine the content languages of documents.
-
Specify document content languages – In CSV, JSON, and Push Endpoint data sources, use a field
language
to specify the language of document contents. In Web Crawler data sources, map ameta_tag_name_fieldname
field that specifies document languages to the fieldlanguage
. Only supported languages can be specified. -
Detect document content languages – For documents that don’t specify a language, Site Search attempts to detect the language. Only supported languages are detected.
-
-
Specify the search language – Specify the search language so that Site Search can boost documents in that language, which moves the documents closer to the beginning of search results. Boosting occurs in search results in a Results module and in search results returned by the Search API.
TipSpecifying a search language doesn’t restrict results to that language. It does result in documents in that language being boosted. -
Default search language: Specify the default search language for Search Box modules. In the Page Builder, hover over the Search Box module, and then click
. Select the language from the Default Search Language dropdown list. The language English is selected by default.
-
Per-module search language: Override the default search language for a specific Search Box module. In the snippet for the Search Box, include a
language
attribute in the<cloud-search-box>
element, for example,<cloud-search-box language="fr"></cloud-search-box>
. -
Search language for a Search API request – Include the
language
URL parameter in the Search API request.
-
-
Language-specific stemming and lemmatization – During indexing and searches, language-specific stemming and lemmatization broadens the search results to include other linguistic forms of the search term. For example, a search for "searching" would also match "search" and "searched".
-
UI language for modules – Specify the UI language of embedded modules.
-
Language-specific stemming and lemmatization – During indexing and searches, language-specific stemming and lemmatization broadens the search results to include other linguistic forms of the search term. For example, a search for "searching" would also match "search" and "searched".
-
The default UI language for all embedded modules is English.
-
To specify a different UI language for an embedded module (or to explicitly specify English), add a
language
URL parameter in theloader.js
URL in the embed snippet that you copy to the<head>
element. For example, for German:<!-- Lucidworks.cloud embed script --> <script async="false" src="https://{subdomain}.lucidworks.cloud/{pathname}/embed/v1/ui/loader.js?language=de"></script>
You can add the URL parameter to the script before copying it. On the dialog box for embedding the module, select the UI language from the UI language dropdown list. Then copy the snippet.
-
-
UI strings in modules – You can override strings in embedded modules by adding a
<script>
element that specifies the strings to the HTML page in which the module is embedded; for example:<script> window.AppkitTranslations['components.breadcrumbs.clear-all'] = 'Clear breadcrumbs'; window.AppkitTranslations['components.facet.show-less'] = 'Show fewer'; window.AppkitTranslations['components.more-like-this.subtitle'] = 'More like <em>{result}</em>'; window.AppkitTranslations['components.more-like-this.title'] = 'Similar Results'; window.AppkitTranslations['components.no-results.title'] = 'Sorry, no results match your search criteria.'; window.AppkitTranslations['components.pagination.next'] = 'Next'; window.AppkitTranslations['components.pagination.previous'] = 'Previous'; window.AppkitTranslations['components.spelling-suggestions.did-you-mean'] = 'Did you mean {query}?'; window.AppkitTranslations['components.spelling-suggestions.no-results'] = 'Search query {query} gave no results.'; window.AppkitTranslations['schema.name.default'] = '(Missing Name)'; </script>
-
Boosting – In search results, including the one for all documents on the All tab of a Topic Tabs module, documents with a specified-or-detected language matching the default search language are boosted (moved up some toward the top of search results). Promoted documents (if any) are at the very top. Boosting doesn’t guarantee that there won’t be documents in other languages near the top of the search results.
Boosting by this language match is also applied to suggested documents that are displayed while a user enters a search query in the Search Box (if the Search Box is configured for this).
Specifying and detecting languages
Site Search lets the data supplier specify the languages of documents. For documents that don’t have a specified language, Site Search attempts to detect the language. Specification and detection of languages apply to all data source types.
For CSV, JSON, and Push Endpoint data sources, you probably have direct control over the fields present.
For Web Crawler data sources, you might or might not have control over whether and how documents specify their language.
For all data source types, Site Search:
-
Uses specified languages if they are supported languages, possibly converting their format, for example, convering
en-US
andEN
toen
. -
Discards specified languages that aren’t supported, leaving an empty field.
-
Discards specified languages that don’t conform to any of the formats exemplifed by
en
,EN
, oren-US
. The last one islanguage-locale
.
These transformations are in place:
-
CSV, JSON, and Push Endpoint data sources – In a field named
language
(or possiblyLanguage
orLANGUAGE
) that you supply. -
Web Crawler data sources – In a field named
language
after you map ameta_tag_name_fieldname
field that specifies document languages to the fieldlanguage
.
Specify languages
Websites can specify the content language(s) of their web pages in several ways. Among these are:
-
In the <html> tag – The
<html>
tag with the attributelang
having the language as the value. For example:<html lang="fr">
Site Search places languages specified in
<html>
tags in themeta_tag_http_lang
field. -
In <meta> tags – A
<meta>
tag with the attributename
having the valuelanguage
and the attributecontent
having the language as the value. For example:<meta content="fr" name="language">
Site Search places languages specified in
<meta>
tags in themeta_tag_name_fieldname
field, for example,meta_tag_name_language
(or possiblymeta_tag_name_lang
or other names; this depends on which attribute is used to specify the language).
For Site Search to be able to use the specified language information, you must map the meta_tag_http_lang
or meta_tag_name_fieldname
field to the language
field.
-
If you map one of these fields to the
language
field then, before language detection, thelanguage
field contains the specified languages (only the supported ones). In documents for which the language wasn’t specified, or it was specified but isn’t supported, thelanguage
field is empty. -
If you don’t map one of these fields to the
language
field then, before language detection, thelanguage
field is empty for all documents.
Detect languages
During indexing, Site Search tries to detect the content language for all documents that don’t specify a language in the language
field. The more language in a document, the higher the probability of accurate detection.
Example of localizing a search app to use two languages
To build a search app that searches in two languages, you could build it such that:
-
It indexes websites in English and French, using separate Web Crawler data sources for the English and French content.
-
Topics associated with the data sources permit categorization of the results by language in the tabs in the Topic Tabs module. You can also specify topics in search queries.
-
If embedded modules are used, the app has separate Search Box modules for English and French. The first module specifies the UI language and search language as English, and the second as French.
-
If Search Box modules suggest documents, then suggested documents with contents in the search language are boosted.
-
If the Search API is used, then the API calls specify the search language.