Use Language Features

Site Search lets you develop search apps that use specific language features for one or more supported languages.

Default language

The default language is English. If your goal is an English language app, then you can develop a Site Search app, use embedded modules or APIs to develop a search app, and not give language a second thought.

In a bit more detail, these apply to the English language case:

  • Document content language – The contents of documents are in English (at least mostly).

  • Language specification – It is possible though not necessary to specify that document contents are in English.

  • Language detection – Language detection happens, detecting English.

  • Boosting – Documents in English are boosted toward the top of results relative to documents in other languages. If there are no documents in other languages, this doesn’t do anything.

  • UI language of embeddable modules – Language strings in embeddable modules are in English.

Supported languages

You can localize a specific Site Search app for searches in any of these languages:

  • English (en), the default

  • Dutch (nl)

  • German (de)

  • Spanish (es)

  • French (fr)

  • Italian (it)

  • Polish (pl)

  • Portuguese (pt)

  • Russian (ru)

  • Swedish (sv)

Site Search supports one-language-per-search-app development, as clarified in the sections that follow.

A single Site Search app can be the foundation for a search app or apps in one or more of the supported languages:

  • Each embedded Search Box module can specify one UI language (or use the default, English) and one search language (or use the default search language set in the configuration of the Search Box).

  • Each API call specifies a single search language, or relies upon the default search language for the Site Search app.

Note
It isn’t currently possible to localize the Lucidworks Cloud dashboard or the Admin UI for Site Search apps in a language other than English.

Language features for supported languages

Site Search has these language features.

  • Index language-specific parts of websites – Although not a language feature per se, in some cases you can separate documents by language by creating different data sources for different parts of websites. For example, one Web Crawler data source in a Site Search app might index https://my.company.com/products/en for the English pages and another data source would index https://my.company.com/products/fr for the French pages.

    Documents from different data sources are available on different tabs in a Topic Tabs module.

  • Specify and detect the content languages of documents – Site search uses a combination of language specification and language detection to determine the content languages of documents.

    • Specify document content languages – In CSV, JSON, and Push Endpoint data sources, use a field language to specify the language of document contents. In Web Crawler data sources, map a meta_tag_name_fieldname field that specifies document languages to the field language. Only supported languages can be specified.

    • Detect document content languages – For documents that don’t specify a language, Site Search attempts to detect the language. Only supported languages are detected.

  • Specify the search language – Specify the search language so that Site Search can boost documents in that language, which moves the documents closer to the beginning of search results. Boosting occurs in search results in a Results module and in search results returned by the Search API.

    Tip
    Specifying a search language doesn’t restrict results to that language. It does result in documents in that language being boosted.
    • Default search language: Specify the default search language for Search Box modules. In the Page Builder, hover over the Search Box module, and then click Edit module. Select the language from the Default Search Language dropdown list. The language English is selected by default.

    • Per-module search language: Override the default search language for a specific Search Box module. In the snippet for the Search Box, include a language attribute in the <cloud-search-box> element, for example, <cloud-search-box language="fr"></cloud-search-box>.

    • Search language for a Search API request – Include the language URL parameter in the Search API request.

  • Language-specific stemming and lemmatization – During indexing and searches, language-specific stemming and lemmatization broadens the search results to include other linguistic forms of the search term. For example, a search for "searching" would also match "search" and "searched".

  • UI language for modules – Specify the UI language of embedded modules.

    • Language-specific stemming and lemmatization – During indexing and searches, language-specific stemming and lemmatization broadens the search results to include other linguistic forms of the search term. For example, a search for "searching" would also match "search" and "searched".

    • The default UI language for all embedded modules is English.

    • To specify a different UI language for an embedded module (or to explicitly specify English), add a language URL parameter in the loader.js URL in the embed snippet that you copy to the <head> element. For example, for German:

      <!-- Lucidworks.cloud embed script -->
      <script async="false" src="https://{subdomain}.lucidworks.cloud/{pathname}/embed/v1/ui/loader.js?language=de"></script>

      You can add the URL parameter to the script before copying it. On the dialog box for embedding the module, select the UI language from the UI language dropdown list. Then copy the snippet.

  • UI strings in modules – You can override strings in embedded modules by adding a <script> element that specifies the strings to the HTML page in which the module is embedded; for example:

    <script>
     window.AppkitTranslations['components.breadcrumbs.clear-all'] = 'Clear breadcrumbs';
     window.AppkitTranslations['components.facet.show-less'] = 'Show fewer';
     window.AppkitTranslations['components.more-like-this.subtitle'] = 'More like <em>{result}</em>';
     window.AppkitTranslations['components.more-like-this.title'] = 'Similar Results';
     window.AppkitTranslations['components.no-results.title'] = 'Sorry, no results match your search criteria.';
     window.AppkitTranslations['components.pagination.next'] = 'Next';
     window.AppkitTranslations['components.pagination.previous'] = 'Previous';
     window.AppkitTranslations['components.spelling-suggestions.did-you-mean'] = 'Did you mean {query}?';
     window.AppkitTranslations['components.spelling-suggestions.no-results'] = 'Search query {query} gave no results.';
     window.AppkitTranslations['schema.name.default'] = '(Missing Name)';
    </script>
  • Boosting – In search results, including the one for all documents on the All tab of a Topic Tabs module, documents with a specified-or-detected language matching the default search language are boosted (moved up some toward the top of search results). Promoted documents (if any) are at the very top. Boosting doesn’t guarantee that there won’t be documents in other languages near the top of the search results.

    Boosting by this language match is also applied to suggested documents that are displayed while a user enters a search query in the Search Box (if the Search Box is configured for this).

Specifying and detecting languages

Site Search lets the data supplier specify the languages of documents. For documents that don’t have a specified language, Site Search attempts to detect the language. Specification and detection of languages apply to all data source types.

For CSV, JSON, and Push Endpoint data sources, you probably have direct control over the fields present.

For Web Crawler data sources, you might or might not have control over whether and how documents specify their language.

For all data source types, Site Search:

  • Uses specified languages if they are supported languages, possibly converting their format, for example, convering en-US and EN to en.

  • Discards specified languages that aren’t supported, leaving an empty field.

  • Discards specified languages that don’t conform to any of the formats exemplifed by en, EN, or en-US. The last one is language-locale.

These transformations are in place:

  • CSV, JSON, and Push Endpoint data sources – In a field named language (or possibly Language or LANGUAGE) that you supply.

  • Web Crawler data sources – In a field named language after you map a meta_tag_name_fieldname field that specifies document languages to the field language.

Specify languages

Websites can specify the content language(s) of their web pages in several ways. Among these are:

  • In the <html> tag – The <html> tag with the attribute lang having the language as the value. For example:

    <html lang="fr">

    Site Search places languages specified in <html> tags in the meta_tag_http_lang field.

  • In <meta> tags – A <meta> tag with the attribute name having the value language and the attribute content having the language as the value. For example:

    <meta content="fr" name="language">

    Site Search places languages specified in <meta> tags in the meta_tag_name_fieldname field, for example, meta_tag_name_language (or possibly meta_tag_name_lang or other names; this depends on which attribute is used to specify the language).

For Site Search to be able to use the specified language information, you must map the meta_tag_http_lang or meta_tag_name_fieldname field to the language field.

  • If you map one of these fields to the language field then, before language detection, the language field contains the specified languages (only the supported ones). In documents for which the language wasn’t specified, or it was specified but isn’t supported, the language field is empty.

  • If you don’t map one of these fields to the language field then, before language detection, the language field is empty for all documents.

Detect languages

During indexing, Site Search tries to detect the content language for all documents that don’t specify a language in the language field. The more language in a document, the higher the probability of accurate detection.

Example of localizing a search app to use two languages

To build a search app that searches in two languages, you could build it such that:

  • It indexes websites in English and French, using separate Web Crawler data sources for the English and French content.

  • Topics associated with the data sources permit categorization of the results by language in the tabs in the Topic Tabs module. You can also specify topics in search queries.

  • If embedded modules are used, the app has separate Search Box modules for English and French. The first module specifies the UI language and search language as English, and the second as French.

  • If Search Box modules suggest documents, then suggested documents with contents in the search language are boosted.

  • If the Search API is used, then the API calls specify the search language.