Configure A SharePoint V1 Optimized Datasource

The SharePoint connector retrieves content and metadata from an on-premises SharePoint repository.

1. Decide what you need to crawl

The first and most important thing to do is determine what you are trying to crawl, and to pick your “Start Links” accordingly.

Choose one of the following:

How to crawl an entire SharePoint Web application

  1. Leave the Limit Documents > Fetch all site collections option checked (as it is by default).

  2. Specify the Web application URL as a site.

    For example: https://lucidworks.sharepoint.local/

Crawling an entire SharePoint Web application requires administrative access to SharePoint.

How to crawl a subset of SharePoint site collections

  1. Uncheck the Limit Documents > Fetch all site collections option.

  2. Specify a "Start Link" for each site collection that you want to crawl.

    Examples: https://lucidworks.sharepoint.local/sites/site1, https://lucidworks.sharepoint.local/sites/site2, https://lucidworks.sharepoint.local/sites/site3

How to crawl a specific sub-site, list, or list item:

  1. Uncheck the Limit Documents > Fetch all site collections option.

  2. Specify a "Start Link" for each site collection that contains the item you want to fetch.

  3. Specify a non-wildcard Inclusive Regular Expression for each parent.

    For example, if you want to crawl https://lucidworks.sharepoint.local/sites/mysitecol/myparentsite/somesite then you must include inclusive regexes for all parents along the way:

    If you exclude a parent item of the site, the connector will not crawl the site because it will never spider down to it during the crawl process.

2. Set up permissions for the crawl

You have two options here:

How to set up a crawl account

1. Create a Lucidworks Fusion crawl permission

  1. Create a new Crawl permission group by going to Site Settings:

    Site Settings

  2. Click Site Permissions:

    Site Permissions

  3. Click Permission Levels:

    Permission Levels

  4. Click Add a Permission Level:

    Add a Permission Level

  5. Name the new perission level “Lucidworks Fusion Service Permission” and assign these permissions:

    • View Items - View items in lists and documents in document libraries.

    • Open Items - View the source of documents with server-side file handlers.

    • View Versions - View past versions of a list item or document.

    • View Application Pages - View forms, views, and application pages. Enumerate lists.

    Site Permissions
    • View Web Analytics Data - View reports on Web site usage.

    • Browse Directories - Enumerate files and folders in a Web site using SharePoint Designer and Web DAV interfaces.

    • View Pages - View pages in a Web site.

    • Enumerate Permissions - Enumerate permissions on the Web site, list, folder, document, or list item.

    • Browse User Information - View information about users of the Web site.

    • Use Remote Interfaces - Use SOAP, Web DAV, the Client Object Model or SharePoint Designer interfaces to access the Web site.

    • Open - Allows users to open a Web site, list, or folder in order to access items inside that container.

    • Edit Personal User Information - Allows a user to change his or her own user information, such as adding a picture.

2. Create a Fusion crawl group and assign the crawl service account to it

  1. For each top-level site you want to be able to crawl, go to Site Settings:

    Site Settings

  2. Click Site Permissions:

    Site Permissions

  3. Click Create Group:

    Create Group

  4. Give it the Lucidworks Crawl permission you created earlier:

    Select the Lucidworks crawl permission

  5. Add the service account to the Crawl group:

    Add to group

    Add to group

At this point, the user should be able to crawl without needing administrator rights.

Limitations of a crawling SharePoint Online with a non-administrative account

There is one important drawback of crawling SharePoint Online with a non-administrative account: Only SharePoint Online Administrators are allowed to list site collections from SharePoint Online.

So if you want to crawl multiple site collections from your SharePoint Online tenant, you must either

  • list them in the Start Links explicitly, or

  • provide a SharePoint administrator account when crawling SharePoint Online.

The diagram below illustrates in red what a non-administrator user can crawl:

Non-admin crawling permissions

A non-administrator can be configured to list sub-sites in a site collection. But a non-administrative user cannot list the site collections given the tenant URL.

For example: A non-admin user can list the sub-sites in, such as,, and so on.

But only an admin can list the Site Collections in

How to provide admin access to crawl

See the SharePoint documentation for instructions.