Index and Search

In this Site Search tutorial, you will:

  1. Open a trial Site Search app, if one already exists. If no trial app exists, then create it.

  2. Add a Web Crawler data source and index data.

  3. Search.

1. Open a trial Site Search app or add one

Open a trial Site Search app, if one already exists. If a trial app doesn’t exist, add one.

For this tutorial, we assume that this is the first time the app is opened.

1.1. Open a trial app

When Lucidworks first creates your Lucidworks Cloud account, you get one app out of the box, named "Site Search". You can use that app for this tutorial.

If your account has one or more licensed apps, you can add a trial app to use for this tutorial.

To open a trial app

In the dashboard, click Open for an app that mentions the trial period.

1.2. Add a trial app

Add a trial Site Search app to use for this tutorial.

To create a new app
  1. Open your dashboard at https://lucidworks.cloud/dashboard.

  2. Click Add Search App.

    Add Site Search app

  3. Customize the path to the new app.

    This path is appended to your domain name, as in https://subdomain.lucidworks.cloud/pathname.

    For example, you could choose https://subdomain.lucidworks.cloud/intranet, https://subdomain.lucidworks.cloud/customers, https://subdomain.lucidworks.cloud/archive, and so on.

  4. Click Create New App.

    This returns you to the dashboard, where your app is shown as "Deploying":

    Deploying

    Deployment takes a few minutes. When the new app is ready, an Open button appears.

  1. Open the app you just added. In the Lucidworks Cloud dashboard, click Open for that app.

2. Add a Web Crawler data source

The first time you open the Admin UI for a Site Search app, the Page Builder is open. You see a slide-out panel for adding a data source:

Add a data source

Add a Web Crawler data source. Site Search will crawl the documents on the website and produce a searchable index for the website.

To add a data source to index a website
  1. In the Site Search menu, click Add new data source, and then click Web crawler. If you are adding the first data source, just click Web crawler.

  2. On the Configuration tab, specify which website to crawl and which documents to index. For this tutorial, you will index documents on the Lucidworks Documentation website. You could also repeat these tasks with a website of your own choosing.

    1. Edit the name of the data source. At the top of the page, click the large word Web Crawler and enter Documentation.

    2. Enter the Start URL https://doc.lucidworks.com/.

    3. In Data Source Topics, enter the topic documentation. In Topic Tabs modules, topics separate search results by data source.

      Add documentation data source

    4. Click Save and Index. Site Search saves the data source definition, connects to the data source, and indexes documents.

      At the top of the page, you’ll see "Connecting…​" followed by "Connected." In the bottom left corner, you’ll see status messages about the number of documents indexed.

      Activity

      Tip
      You might need to refresh the page in the browser to see all of the documents that were found in the crawl.
    5. When the status messages stop and the Activity section disappears, click Close Close to close the slide-out panel for configuring a data source.

      You are now in the Page Builder.

      Page Builder

3. Add a More Like This module

The Page Builder already has the modules Search Box, Topic Tabs, Results, and Facets.

Add a More Like This module to the page:

To add a More Like This module
  1. In the Page Builder, scroll to the bottom of the page.

  2. At the bottom of the page, click Add module.

  3. Click the More Like This tile and drag it to the bottom of the page. A gray box appears. Drop the module in the box.

    More Like This module

4. Configure facets

Configure facets to let the user more easily navigate categories of search results.

To configure facets
  1. In the Page Builder, scroll to the top of the page.

  2. Hover over the Facets module, and then click Edit module, or just click this image, which is present when facets haven’t been configured:

    Configure facets

  3. For the first facet, click the Facet fields dropdown menu, and then select path_1. Under Display Name, enter Product.

    Configure facets

  4. Click Apply.

    The facets for the field path_1 appear.

    Facets present

5. Exclude directories

Among the facets, you see directories that aren’t needed in this search app, such as fusion-pipeline-javadocs and assets. We’ll exclude them now.

In the menu, click the data source Documentation.
  1. Expand Show more options.

  2. Under Exclude Documents, enter strings for directories and files you want to exclude. Use these strings, which exclude the named directories at the top level and PDF files:

    fusion-pipeline-javadocs/*
    assets/*
    lucidworks-hdpsearch/*
    products/*
    *.pdf
  3. Click Save and Index.

  4. Monitor the status messages under Activity. When they cease, all documents have been indexed.

  5. Click Close Close to close the slide-out panel for configuring a data source.

  6. Notice that the facets no longer include the directories that you excluded.

    Directories excluded

Now you can search for documents. You can search from the Page Builder, or you can close the Page Builder and search from the Search interface.

To open Search

At the top of the menu, click Close Close.

Open Search

Search for some documents.