SharePoint Connector and Datasource Configuration

The SharePoint connector retrieves content and metadata from a SharePoint repository.

It can be configured to work with Active Directory (AD) or LDAP to retrieve the ACLs for each object, which can then be used at query time to verify if the user can access the object in queries. This connector can access a SharePoint Repository running on the following platforms:

  • Microsoft SharePoint 2010

  • Microsoft SharePoint 2013

  • Microsoft SharePoint 2016

When crawling, the connector discovers SharePoint contents in the following order: sites, then sub-sites (children). A site may contain:

  • Sub-sites

  • Generic Lists

    • List Items

      • Attachments

  • Document Libraries

    • Folders

    • Documents

When the connector re-crawls a SharePoint repository, each previously crawled URL is accessed before any newly discovered objects, but no order is guaranteed. The connector uses a cache to store retrieved parent objects to avoid unnecessary requests.

The last modified date of each object is retrieved to determine if it has changed since the last crawl. If it has changed, a new request is made to retrieve the changes. If it has not changed, the object is skipped and no additional request is made.

The SharePoint connector uses SOAP to connect to and retrieve documents, lists, and other objects for indexing. It does not access a SharePoint site in the same way that a regular user does, and it needs additional privileges to use the SOAP interface to SharePoint. In order to use security trimming to restrict user access to SharePoint objects, the SharePoint connector must a be user who has sufficient privileges to be able to read every document in the system and determine which users can access them.

See this tutorial about configuring a Sharepoint datasource and enabling security trimming:

SharePoint Permissions

SharePoint security trimming restricts access to documents based on user permissions. There are two types of permissions in Sharepoint:

  • Site permissions, which are:

    • managed by Sharepoint.

    • customizable for each site or subsite.

    • inherited by subsites as the default permissions.

    • grantable to users and groups.

  • User permissions, which are:

    • assigned by group membership, when groups have been configured and provided permissions.

    • assigned directly to the user

These permissions are stored as ACLs. When the SharePoint server is configured with security trimming set to TRUE, then documents retrieved from SharePoint have the set of all ACLs stored in a 'acl_ss' field on each document.

At search time, the ACLs are used to verify if a user has access to a document. This is configured in a query pipeline with a Security Trimming Query Stage.

Permissions Required by the SharePoint Connector

SharePoint allows you to create Permission Policies which grant administration-level access to site functions without granting full access to all or a group of users. Depending on your local security policy, a custom Permission Policy may need to be created, or a specific user may need to be granted the required permissions to access objects and retrieve the ACLs and metadata needed for indexing.

The SharePoint datasource must be configured with the name of a user who has enough permissions to crawl the entire site. These permissions are not available to a user from default user permissions or group assignments, and will require use of a custom Permission Policy. The permissions required correspond to the concept of Site Collection Auditor, a permission type which is not the same as a Site Administrator, but requires almost all of the Site Administrator privileges.

You will need to work with your SharePoint administrator to insure that the account used by the SharePoint connector and datasource has all of the permissions listed in the following table:

Permission Type Permission Description

Site Collection Auditor

Full Read access for the entire site collection, including reading permissions and configuration data.

List

View Items

View items in lists and documents in document libraries.

List

Open Items

View the source of documents with server-side file handlers.

List

View Versions

View past versions of a list item or document.

Site

Browse Directories

Enumerate files and folders in a Web site using SharePoint Designer and WebDAV interfaces.

Site

View Pages

View pages in a Web site.

Site

Enumerate Permissions

Enumerate permissions on the Web site, list, folder, document, or list item.

Site

Browse User Information

View information about users of the Web site.

Site

Use Remote Interfaces

Use SOAP, WebDAV, Client Object Model, or Sharepoint Designer interfaces to access the Web site.

Site

Open

Open a Web site, list or folder in order to access items inside that container.

Default User and Group Permissions

Note: For the purposes of the Sharepoint connector in Fusion, the default user and group permission levels are not sufficient to get ACLs and metadata of SharePoint objects.

SharePoint provides the following default permission levels:

Full Control

This permission level contains all permissions. Assigned to the Site name Owners SharePoint group, by default. This permission level cannot be customized or deleted.

Design

Can create lists and document libraries, edit pages and apply themes, borders, and style sheets in the Web site. Not assigned to any SharePoint group, by default.

Contribute

Can add, edit, and delete items in existing lists and document libraries. Assigned to the Site name Members SharePoint group, by default.

Read

Read-only access to the Web site. Users and SharePoint groups with this permission level can view items and pages, open items, and documents. Assigned to the Site name Visitors SharePoint group, by default.

View Only

View pages, items, and documents. Any document that has a server-side file handler can be viewed in the browser but not downloaded.

SharePoint groups have the following default permissions:

Owners

Full Control permission level

Members

Contribute permission level

Visitors

Read permission level

Troubleshooting Permission Issues

When the connector is configured using a SharePoint username of a user without sufficient privileges, the Fusion connectors logfile $FUSION/var/log/connectors/connectors.log will contain an error like the following:

crawler.common.sharepoint.exception.SharepointException: Server was unable to process request. ---> Attempted to perform an unauthorized operation. at crawler.common.sharepoint.service.BaseService.analyzeResponse(BaseService.java:194) ~[classes/:?] at crawler.common.sharepoint.service.SiteDataService.getContentBySiteOrList(SiteDataService.java:169) ~[classes/:?] at com.lucidworks.permissions.Main.test1(Main.java:50) [classes/:?] at com.lucidworks.permissions.Main.main(Main.java:32) [classes/:?]

This user may have sufficient privileges to connect via SOAP and read the documents, but not enough to be able to get the ACLs and other associated metadata. This may result in complete lack or access to documents, or access to unauthorized documents, therefore it is extremely important to confirm that the configured SharePoint user has the requisite privileges.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.