SharePoint Connector and Datasource Configuration

The SharePoint connector retrieves content and metadata from an on-premises SharePoint repository.

As of Fusion 4.2.4, the V1 platform version this connector is replaced by a new platform version, V1 Optimzied. See the SharePoint and SharePoint Online connector platform versions for details about the differences between V1 and V1 Optimized platform versions of this connector.

To retrieve content from cloud-based SharePoint repositories, see the SharePoint Online V1 connector.

This connector can access a SharePoint repository running on the following platforms:

  • Microsoft SharePoint 2010

  • Microsoft SharePoint 2013

  • Microsoft SharePoint 2016 See this tutorial about configuring a SharePoint datasource and enabling security trimming:

When crawling, the connector discovers SharePoint contents in the following order: sites, then sub-sites (children). A site may contain:

  • Sub-sites

  • Generic Lists

    • List Items

      • Attachments

  • Document Libraries

    • Folders

    • Documents

When the connector re-crawls a SharePoint repository, each previously crawled URL is accessed before any newly discovered objects, but no order is guaranteed. The connector uses a cache to store retrieved parent objects to avoid unnecessary requests. The last modified date of each object is retrieved to determine if it has changed since the last crawl. If it has changed, a new request is made to retrieve the changes. If it has not changed, the object is skipped and no additional request is made.

The connector uses SOAP to connect to and retrieve documents, lists, and other objects for indexing. It does not access a SharePoint site in the same way that a regular user does, and it needs additional privileges to use the SOAP interface to SharePoint.

The connector can be configured to work with Active Directory (AD) or LDAP to retrieve the ACLs for each object, which can then be used for security trimming at query time. In order to use security trimming to restrict user access to SharePoint objects, the authenticated user must have sufficient privileges to read every document in the system and determine which users can access them. The permissions requirements are explained below.

SharePoint Permissions

SharePoint security trimming restricts access to documents based on user permissions. There are two types of permissions in SharePoint:

  • Site permissions, which are:

    • managed by SharePoint

    • customizable for each site or subsite

    • inherited by subsites as the default permissions

    • grantable to users and groups

  • User permissions, which are:

    • assigned by group membership, when groups have been configured and provided permissions

    • assigned directly to the user

These permissions are stored as ACLs. When the SharePoint server is configured with security trimming set to "true", then documents retrieved from SharePoint have the set of all ACLs stored in a acl_ss field on each document.

At search time, the ACLs are used to verify if a user has access to a document. This is configured in a query pipeline with a Security Trimming Query Stage.

To crawl all the sites and subsites, the authenticated user must belong the site administrators group. If not, Fusion can still crawl and complete the job, but the crawled data will be limited by the user’s privileges. In addition, a WARNING message will appear in the connector.log indicating that the user is not site administrator and therefore unable to get sites from site collections. The message starts with Authorization Error (401).

Required Permissions

The SharePoint datasource must be configured with the name of a user who has sufficient permissions to crawl the entire site. These permissions require use of a custom Permission Policy. The required permissions correspond to the concept of Site Collection Auditor, a permission type which is not the same as a Site Administrator, but requires almost all of the Site Administrator privileges.

You will need to work with your SharePoint administrator to ensure that the account used by Fusion has all of the permissions listed in the following table:

Permission Type Permission Description

Site Collection Auditor

Full Read access for the entire site collection, including reading permissions and configuration data.


View Items

View items in lists and documents in document libraries.


Open Items

View the source of documents with server-side file handlers.


View Versions

View past versions of a list item or document.


Browse Directories

Enumerate files and folders in a Web site using SharePoint Designer and WebDAV interfaces.


View Pages

View pages in a Web site.


Enumerate Permissions

Enumerate permissions on the Web site, list, folder, document, or list item.


Browse User Information

View information about users of the Web site.


Use Remote Interfaces

Use SOAP, WebDAV, Client Object Model, or SharePoint Designer interfaces to access the Web site.



Open a Web site, list or folder in order to access items inside that container.

Troubleshooting Permission Issues

When the connector is configured using a SharePoint username without sufficient privileges, the Fusion connectors log file fusion/4.1.x/var/log/connectors/connectors.log contains an error like the following:

crawler.common.sharepoint.exception.SharePointException: Server was unable to process request. ---> Attempted to perform an unauthorized operation. at crawler.common.sharepoint.service.BaseService.analyzeResponse( ~[classes/:?] at crawler.common.sharepoint.service.SiteDataService.getContentBySiteOrList( ~[classes/:?] at com.lucidworks.permissions.Main.test1( [classes/:?] at com.lucidworks.permissions.Main.main( [classes/:?]

This user’s permissions may be sufficient to connect via SOAP and read the documents, but not sufficient to get the ACLs and other associated metadata. This may result in complete lack of access to documents, or access to unauthorized documents. Confirm that the configured SharePoint user has the required privileges.

Cache User Groups to Improve Search Performance

When a user performs a search query with SharePoint security trimming enabled, the security trimming process starts by fetching the user groups. Two types of groups reference SharePoint document ACLs:

  • SharePoint groups

  • Active Directory LDAP security groups

Fusion creates a security filter using this user’s loginName and the groups that they are part of. The security filter is a Solr fq filter. Once the security filter is created, when this user performs a query, she sees only the documents that she is supposed to see.

By default, Fusion looks up a user’s SharePoint groups and LDAP groups every time a search query is performed. But fetching groups for a user is expensive and can hurt query time.

Note, too, that each SharePoint site collection that is part of a datasource has its own unique SharePoint groups. This means that if Fusion has crawled multiple SharePoint site collections, it must look up a user’s groups from each site collection. This is done in parallel for speed, but it dominates query times if there are many site collections to query.

Another consideration is SharePoint or LDAP unplanned down time. If a user performs real-time group lookups during down time, her queries result in missing documents because the security filter is not available.

To help alleviate these issues, Fusion offers a few different caching options. Consider using these caching options if you have many site collections, need extremely fast search, or cannot tolerate SharePoint or LDAP outages.

Security Filter Cache

With a security filter cache, once the query filter for a user has been generated, Fusion reuses the filter for this user for subsequent queries. The cache_expiration_time parameter dictates how long Fusion reuses the filter until generating it again. The cache_max_size parameter dictates the maximum number of items to hold in the security filter cache.

There are two flavors of security filter caches.

  • Local - This security filter is used only locally for this datasource. All other SharePoint datasources in your SharePoint security trimming query pipeline do not have access to this security filter.

  • Global - This security filter is used between multiple SharePoint datasources. So if groups have already been looked up for LDAP or a SharePoint site collection in another site collection, they do not have to be looked up again.

To enable security filter caches:

  1. In the Fusion UI, navigate to SharePoint datasource configuration.

  2. Check the boxes labeled Enable local security filter cache and Enable global security filter cache.

User Group Cache

To make queries significantly faster and also prevent security trimming from failing if any of those other systems happen to be down, you can enable user group caching.

The biggest bottleneck in security trimming for the SharePoint connector is looking up each user’s groups. Caching user groups means that when a security trimming query is performed, a single query to a Solr collection looks up the user’s LDAP and SharePoint groups, instead of going to the LDAP and SharePoint services to get them.

Every time Fusion crawls the SharePoint datasource, it updates the user group cache.

There are some costs of user group caching:

  • Increased indexing time - Fusion needs to build a user group cache while indexing.

  • Stale user groups - If Fusion does not recrawl the SharePoint datasource often enough, the user group cache can get out of date. The more often Fusion recrawls, the closer it is to a realtime user group lookup.

To enable user group caching:

  1. In the Fusion UI, navigate to SharePoint datasource configuration.

  2. Check the box labeled Enable User Group Caching in Solr. At crawl time each User’s LDAP and SharePoint groups will be fetched and stored in a Solr collection.

  3. (Optional) Set User Group Cache Solr Collection Name for all of your SharePoint data sources to the same name (for example, sp_usr_grp). The default is sp_usr_grp_<datasource>, where <datasource> is the ID of your data source. But several SharePoint data sources can share the same Solr collection, and performing this step prevents multiple collections from being created.


When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.