Index and Search
- 1. Open a trial Site Search app or add one
- 2. Add a Web Crawler data source
- 3. Add a More Like This module
- 4. Configure facets
- 5. Exclude directories
- 6. Search
- 7. Additional reading
In this Site Search tutorial, you will:
Open a trial Site Search app, if one already exists. If no trial app exists, then create it.
Add a Web Crawler data source and index data.
1. Open a trial Site Search app or add one
Open a trial Site Search app, if one already exists. If a trial app doesn’t exist, add one.
For this tutorial, we assume that this is the first time the app is opened.
1.1. Open a trial app
When Lucidworks first creates your Lucidworks Cloud account, you get one app out of the box, named "Site Search". You can use that app for this tutorial.
If your account has one or more licensed apps, you can add a trial app to use for this tutorial.
In the dashboard, click Open for an app that mentions the trial period.
1.2. Add a trial app
Add a trial Site Search app to use for this tutorial.
Open your dashboard at https://lucidworks.cloud/dashboard.
Click Add Search App.
Customize the path to the new app.
This path is appended to your domain name, as in
For example, you could choose
https://subdomain.lucidworks.cloud/archive, and so on.
Click Create New App.
This returns you to the dashboard, where your app is shown as "Deploying":
Deployment takes a few minutes. When the new app is ready, an Open button appears.
Open the app you just added. In the Lucidworks Cloud dashboard, click Open for that app.
2. Add a Web Crawler data source
The first time you open the Admin UI for a Site Search app, the Page Builder is open. You see a slide-out panel for adding a data source:
Add a Web Crawler data source. Site Search will crawl the documents on the website and produce a searchable index for the website.
In the Site Search menu, click Add new data source, and then click Web crawler. If you are adding the first data source, just click Web crawler.
On the Configuration tab, specify which website to crawl and which documents to index. For this tutorial, you will index documents on the Lucidworks Documentation website. You could also repeat these tasks with a website of your own choosing.
Edit the name of the data source. At the top of the page, click the large word
Web Crawlerand enter
Enter the Start URL
In Data Source Topics, enter the topic
documentation. In Topic Tabs modules, topics separate search results by data source.
Click Save and Index. Site Search saves the data source definition, connects to the data source, and indexes documents.
At the top of the page, you’ll see "Connecting…" followed by "Connected." In the bottom left corner, you’ll see status messages about the number of documents indexed.Tip
You might need to refresh the page in the browser to see all of the documents that were found in the crawl.
When the status messages stop and the Activity section disappears, click Close to close the slide-out panel for configuring a data source.
You are now in the Page Builder.
3. Add a More Like This module
The Page Builder already has the modules Search Box, Topic Tabs, Results, and Facets.
Add a More Like This module to the page:
In the Page Builder, scroll to the bottom of the page.
At the bottom of the page, click .
Click the More Like This tile and drag it to the bottom of the page. A gray box appears. Drop the module in the box.
4. Configure facets
Configure facets to let the user more easily navigate categories of search results.
In the Page Builder, scroll to the top of the page.
Hover over the Facets module, and then click , or just click this image, which is present when facets haven’t been configured:
For the first facet, click the , and then select
path_1. Under Display Name, enter
The facets for the field
5. Exclude directories
Among the facets, you see directories that aren’t needed in this search app, such as
assets. We’ll exclude them now.
Expand Show more options.
Under Exclude Documents, enter strings for directories and files you want to exclude. Use these strings, which exclude the named directories at the top level and PDF files:
fusion-pipeline-javadocs/* assets/* lucidworks-hdpsearch/* products/* *.pdf
Click Save and Index.
Monitor the status messages under Activity. When they cease, all documents have been indexed.
Click Close to close the slide-out panel for configuring a data source.
Notice that the facets no longer include the directories that you excluded.
Now you can search for documents. You can search from the Page Builder, or you can close the Page Builder and search from the Search interface.
At the top of the menu, click Close .
Search for some documents.