SharePoint V2Connector Configuration Reference
The SharePoint connector retrieves content and metadata from an on-premises SharePoint repository.
Deprecation and removal notice
This connector is deprecated as of June 19, 2023 and is removed or expected to be removed as of January 31, 2024. The SharePoint V2 connector is not compatible with Fusion 5.6 and later, regardless of the removal date. Use the SharePoint Optimized V2 connector instead. For more information about deprecations and removals, including possible alternatives, see Deprecations and Removals. |
This connector supports the following SharePoint server versions:
-
Microsoft SharePoint 2013
-
Microsoft SharePoint 2016
-
Microsoft SharePoint 2019
-
Microsoft SharePoint Online
Configuration
This section specifies the configuration properties for the SharePoint V2 connector.
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
Web applications
At least one web application must be defined in the configuration, which represents the SharePoint web application to crawl. |
Property | Description | ||
---|---|---|---|
Web Application name |
Unique name of the web application in the specific configuration. Required field. Type: string. For example, |
||
Web Application URL |
URL of the web application. Required field. For example, |
||
Site Collection List |
List of site collection paths. For example, if the site collection URL is |
||
SharePoint List or libraries in the site collection |
A set of list or library names within the site collection to crawl. For example, Documents. |
||
SharePoint webs |
List of web names to crawl within the parent site collecton. |
||
SharePoint List or library name |
Name of a list or library under the SharePoint web context. For example, Documents. |
||
SharePoint Folders |
Folders within the list to crawl. |
||
Excluded Site Collections |
List of site collections to exclude from the crawl.
|
||
Included file extensions |
Attachments with a file extension from this list are included (and indexed) when filtering occurs. For example,
|
||
Excluded file extensions |
Attachments with a file extension from this list are excluded (and discarded) when filtering occurs. For example, |
||
Inclusive regexes |
Regular expressions (regex) defined to index SharePoint objects including sites, lists, items, and attachments. The SharePoint object URL is used to match the regular expressions. |
||
Exclusive regexes |
Regular expressions (regex) defined to discard SharePoint objects including sites, lists, items, and attachments. The SharePoint object URL is used to match the regular expressions. |
Authentication
Select only one authentication method for the configuration. |
Windows NT LAN Manager (NTLM) authentication
Property | Description |
---|---|
User |
User name of the authenticating account |
Password |
Password of the authenticating account |
Domain |
Domain in which the client workstation has membership |
Workstation |
Client workstation name |
Forms-based authentication (FBA)
Property | Description |
---|---|
Username |
User name created in the membership database |
Password |
Password of the user name created in the membership database |
SharePoint online authentication
Property | Description |
---|---|
SharePoint online account |
Valid SharePoint account |
Password |
SharePoint online account password |
Microsoft login URL |
URL of the Microsoft login server |
App-only authentication (OAuth)
Property | Description |
---|---|
Azure AD (Active Directory) client ID |
Azure client ID of the application |
Azure AD tenant |
Office365 tenant name |
Azure AD client secret |
Azure client secret of the client ID |
Azure AD login endpoint |
Login URL for authentication |
App-only authentication (OAuth) with private key
Property | Description |
---|---|
Azure AD (Active Directory) client ID |
Azure client ID of the application |
Azure AD tenant |
Office365 tenant name |
Azure AD login endpoint |
Login URL for authentication |
Azure AD PKCS12 key |
The base64 string of the PKCS12 keystore loaded with the PFX (personal exchange format) certificate file supplied by Azure AD |
Azure AD PKCS12 keystore password |
Password of the Azure AD PKCS12 keystore |
Requirements to index all site collections
The following conditions must be met to index site collections:
-
The authentication method must be one of the following:
-
Windows NT LAN Manager (NTLM)
-
SharePoint online
-
App-only (OAuth)
-
-
Credentials must list all site collections. For:
-
NTLM. Credentials must be an administrative account in the configuration.
-
SharePoint online. Credentials must be a SharePoint admin account in the configuration, not a site collection admin account.
-
App-only (OAuth). The application registered in the SharePoint instance must have a tenant scope.
-
Crawl searchable content
For detailed information about enabling and crawling searchable content, see Enable content on a site to be searchable.
Limit documents
These properties limit the documents and how they are processed.
Property | Description | ||
---|---|---|---|
Fetch lists |
If enabled:
|
||
Fetch list items |
If enabled, retrieves and indexes list items. |
||
Fetch attachments |
If enabled, retrieves and indexes item attachments. |
||
Index sites |
If enabled, indexes sites.
|
||
Index lists |
If enabled, indexes lists.
|
||
Index empty lists |
|
||
Index folders |
|
||
Index taxonomy terms (Experimental) |
If enabled, indexes taxonomy terms from the default term store and places those terms in the content collection. |
||
Index Document Metadata |
Indexes metadata for files and attachments that do not meet maximum or minimum size limits.
|
||
Included List Base Types |
If the Fetch Lists property is set to true and base type is:
Base list types are Document Library, Generic List, Issue, and Survey. |
Request settings
Property | Description |
---|---|
API query row limit |
Number of items to retrieve per page. Default value is 500. The connector paginates requests to retrieve list items. |
Changes API query row limit |
Number of events to retrieve per page. Default value is 200. The connector paginates requests to retrieve changes per site collection. |
User agent |
Value of the |
Security trimming configuration
Property | Description |
---|---|
Enable security trimming |
If enabled, the connector indexes SharePoint groups and the role assignments of each object type. Object types are sites, lists, items, and attachments. |
ACL collection name |
Access Control List (ACL) collection name. Role assignments and SharePoint groups are indexed in this collection. |
Security filtering
Security filtering in the SharePoint connector requires the ACL (LDAP) connector to function correctly.
For more information, see Active Directory Connector for ACLs V2 Configuration Reference. |
For content collection, the SharePoint connector indexes documents. The value in the acl_ss
field in each document contains roleAssignment IDs, where the role assignments define each object.
For the access control collection, the SharePoint connector indexes:
-
SharePoint groups that contain Active Directory (AD) users and groups
-
Role assignment
The LDAP ACL connector indexes the AD users and AD groups to the same access control collection.
Common properties
Proxy options
Property | Description |
---|---|
Proxy URL |
URL of proxy server |
Proxy username |
User name to log in to the proxy server |
Proxy password |
Password of the proxy username |
Item count limit
Property | Description |
---|---|
Maximum output limit |
Maximum number of indexed documents. Default value is -1, which specifies no maximum limit. |
Item size limit
Property | Description |
---|---|
Maximum |
Maximum byte size of an attachment |
Minimum |
Minimum byte size of an attachment |
Item retry options
Property | Description |
---|---|
Max retry attempts |
Maximum of attempts to retry if an item fails. |
Retry delay |
Number of seconds (delay) between retries if an item fails. |
Other retry options are deprecated. |
HTTP timeout options
Property | Description |
---|---|
Read timeout |
Number of milliseconds before timeout occurs. Value is passed to the |
Connection timeout |
Number of milliseconds before a connection attempt times out. Value is passed to the |
HTTP connection options
Property | Description |
---|---|
Maximum connections |
Maximum number of connections available in the pool. Default value is 1000. |
Maximum per route |
Maximum number of connections per route in the same target URL. Default value is 200. |
Ignore SSL (Secure Sockets Layer) validation exceptions |
If enabled, the |
Test NTLM permissions to successfully crawl a site collection
This is only applicable to Sharepoint on-premise deployments. |
To verify the NTLM account has appropriate permissions to crawl a site collection using the SharePoint V2 connector:
-
Copy the
check-ntlm-account-can-crawl-sharepoint-site-collection.ps1
PowerShell script below to a folder on your computer.
$site_col_url="https://your.sharepoint-site.com/sites/mysitecol"
$cred = (Get-Credential)
if (-not ([System.Management.Automation.PSTypeName]'ServerCertificateValidationCallback').Type)
{
$certCallback = @"
using System;
using System.Net;
using System.Net.Security;
using System.Security.Cryptography.X509Certificates;
public class ServerCertificateValidationCallback
{
public static void Ignore()
{
if(ServicePointManager.ServerCertificateValidationCallback ==null)
{
ServicePointManager.ServerCertificateValidationCallback +=
delegate
(
Object obj,
X509Certificate certificate,
X509Chain chain,
SslPolicyErrors errors
)
{
return true;
};
}
}
}
"@
Add-Type $certCallback
}
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12;
[ServerCertificateValidationCallback]::Ignore()
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "text/xml")
$headers.Add("SOAPAction", "http://schemas.microsoft.com/sharepoint/soap/GetUpdatedFormDigestInformation")
$headers.Add("X-RequestForceAuthentication", "true")
$headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
$body = "<?xml version=`"1.0`" encoding=`"utf-8`"?>`n<soap:Envelope xmlns:xsi=`"http://www.w3.org/2001/XMLSchema-instance`" xmlns:xsd=`"http://www.w3.org/2001/XMLSchema`" xmlns:soap=`"http://schemas.xmlsoap.org/soap/envelope/`">`n <soap:Body>`n <GetUpdatedFormDigestInformation xmlns=`"http://schemas.microsoft.com/sharepoint/soap/`" />`n </soap:Body>`n</soap:Envelope>"
$response = Invoke-RestMethod "${site_col_url}/_vti_bin/sites.asmx" -Method 'POST' -Headers $headers -Body $body -Credential $cred
$digest_value = $response.Envelope.Body.GetUpdatedFormDigestInformationResponse.FirstChild.DigestValue
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "text/xml")
$headers.Add("X-RequestForceAuthentication", "true")
$headers.Add("X-RequestDigest", $digest_value)
$headers.Add("Accept", "application/json")
$headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
$body = @'
<Request AddExpandoFieldTypeSuffix="true" SchemaVersion="14.0.0.0" LibraryVersion="16.0.0.0"
ApplicationName=".NET Library" xmlns="http://schemas.microsoft.com/sharepoint/clientquery/2009">
<Actions>
<ObjectPath Id="2" ObjectPathId="1"/>
<ObjectPath Id="4" ObjectPathId="3"/>
<Query Id="5" ObjectPathId="3">
<Query SelectAllProperties="false">
<Properties>
<Property Name="Webs" SelectAll="true">
<Query SelectAllProperties="false">
<Properties/>
</Query>
</Property>
<Property Name="Title" ScalarProperty="true"/>
<Property Name="ServerRelativeUrl" ScalarProperty="true"/>
<Property Name="RoleDefinitions" SelectAll="true">
<Query SelectAllProperties="false">
<Properties/>
</Query>
</Property>
<Property Name="RoleAssignments" SelectAll="true">
<Query SelectAllProperties="false">
<Properties/>
</Query>
</Property>
<Property Name="HasUniqueRoleAssignments" ScalarProperty="true"/>
<Property Name="Description" ScalarProperty="true"/>
<Property Name="Id" ScalarProperty="true"/>
<Property Name="LastItemModifiedDate" ScalarProperty="true"/>
</Properties>
</Query>
</Query>
</Actions>
<ObjectPaths>
<StaticProperty Id="1" TypeId="{3747adcd-a3c3-41b9-bfab-4a64dd2f1e0a}" Name="Current"/>
<Property Id="3" ParentId="1" Name="Web"/>
</ObjectPaths>
</Request>
'@
$response = Invoke-RestMethod "${site_col_url}/_vti_bin/client.svc/ProcessQuery" -Method 'POST' -Headers $headers -Body $body -Credential $cred
$response | ConvertTo-Json -Depth 100
-
Change the value in the first line:
$site_col_url="https://your.sharepoint-site.com/sites/mysitecol"
to the URL of your site collection. -
Execute the script. If the result is:
-
A JSON output of your site’s metadata, the account permissions are set correctly.
-
An error such as a 403, 401, or other error, the account permissions are not set correctly. Set permissions correctly and run the script again to verify it executes successfully.
-