SharePoint Optimized V2Connector Configuration Reference
The SharePoint Optimized V2 connector retrieves content and metadata from an on-premises SharePoint repository and cloud-based SharePoint repositories.
|
Verify your connector version
This connector depends on specific Fusion versions. See the following table for the required versions:
Fusion version |
Connector version |
Notes |
Fusion 5.6.1 and later |
v1.1.0 through v1.6.0 |
|
Fusion 5.9.0 |
v1.6.0 and later. Lucidworks recommends using the latest supported connector version. |
Fusion 5.9.0 supports the v2.0.0 connector, but does not support LDAP ACLs integrations or security trimming. |
Fusion 5.9.1 and later |
v2.0.0 and later |
Supports LDAP ACLs integrations and security trimming. |
|
Note the following guidelines for using the SharePoint Optimized V2 connector:
-
There is a pod limit. The SharePoint Optimized V2 connector does not support running multiple instances. Don’t run the connector on more than one pod.
-
Watch for connector compatibility. Use the LDAP ACLs V2 connector with this connector.
To change the number of items to retrieve per page, set the value of apiQueryRowLimit
. The default value is 5000.
To change the number of change events to retrieve per page, set the value of changeApiQueryRowLimit
. The default value is 2000.
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
An Optimized Connector for SharePoint 2010, 2013, 2016, 2019 and SharePoint Online
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - stringrequired
The Parser to use in the associated IndexPipeline.
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
>= 1
<= 1
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1
Multiple of: 1
asyncParsing - boolean
When enabled, content will be indexed asynchronously.
Default: false
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection.
Default: false
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection.
Default: false
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - SharePoint properties
Plugin specific properties.
webApplication - Web application config
The SharePoint Web application to crawl.
webApplicationUrl - string
>= 1 characters
forceFullCrawl - boolean
Do this if you want to force a full crawl each time you run this datasource.
Default: false
siteCollections - array[string]
A list of site collections to crawl. Because only site collection administrators or site collection auditors can list the site collections in a SharePoint web application, you can use this when you are crawling as a user that is not an admin/auditor. This allows you to explicitly list site collections you want to crawl. Specify paths relative to the web application url, such as /sites/site1
Default:
includedFileExtensions - array[string]
Set of file extensions to be fetched. If specified, all non-matching files will be skipped.
Default:
excludedFileExtensions - array[string]
A set of all file extensions to be skipped from the fetch.
Default:
inclusiveRegexes - array[string]
Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.
Default:
exclusiveRegexes - array[string]
Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.
Default:
includeContentsExtensions - array[string]
Only files with these file extensions will have their contents downloaded when indexing this item. Files without these file extensions will not have their contents downloaded The comparison is not case sensitive, and you do not have to specify the '.' but it still work if you do. For example "zip" and ".zip" are both acceptable. The whitespace will also be trimmed.
Default:
excludeContentsExtensions - array[string]
File extensions of files that will not have their contents downloaded when indexing this item. The list item metadata will still be indexed but the file contents will not. The comparison is not case sensitive, and you do not have to specify the '.' but it still work if you do. For example "zip" and ".zip" are both acceptable. The whitespace will also be trimmed.
Default:
restrictToSpecificItems - array[string]
Instead of specifying regular expressions to restrict the SharePoint items that are crawled, this allows you to specify specific SharePoint item URLs of the resources that are to be crawled. The crawl will then be restricted to only include these specified SharePoint items URLs. You can specify list, sub-site, folder, and list item URLs.
Default:
apiQueryRowLimit - number
>= 1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 5000
Multiple of: 1
changeApiQueryRowLimit - number
>= 1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 2000
Multiple of: 1
siteCollectionDeletionThreshold - number
Site collections will be removed from the index after they are no longer available for this many hours. Set to 0 for immediate deletion. Default is 2 weeks.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 336
Multiple of: 1
moderationStatusFilter - array[number]
If specified, only index items with the following moderation statuses specified. Valid values are: 0 = The list item is approved, 1 = The list item has been denied approval, 2 = The list item is pending approval, 3 = The list item is in the draft or checked out state, 4 = The list item is scheduled for automatic approval at a future date.
fetchTaxonomies - boolean
Fetch Taxonomy data from sharepoint.
Default: false
siteCollectionTaxonomyCacheSize - number
To make the connector faster, when the taxonomy terms for a site collection are needed, they are cached to avoid looking up from disk again. This is the size of that cache.
>= 1
<= 10000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10
Multiple of: 1
includedListBaseTypes - array[string]
If specified, the only SharePoint lists that will be fetched are the ones that match one of these base types. Accepts values (not case sensitive): [None, GenericList, DocumentLibrary, Unused, DiscussionBoard, Survey, Issue]
includedObjectTypes - array[string]
If specified, only fetch specific SharePoint objects. SharePoint object types that can be specified (not case sensitive): [Site, List, List_Item, Folder, Attachment]
proxyProperties - Proxy options
A set of options for configuring the proxy.
username - string
Proxy username
>= 1 characters
password - string
Proxy password
>= 1 characters
url - string
The proxy URL
>= 1 characters
ntlmProperties - NTLM Authentication settings
user - string
User
>= 1 characters
password - string
Password
>= 1 characters
domain - string
Domain
>= 1 characters
workstation - string
Workstation
>= 1 characters
sharepointOnlineAuthProperties - SharePoint Online Authentication
Settings relevant only when crawling SharePoint online .
account - string
Your Microsoft SharePoint Online Account name which takes the form of username@domain.com
>= 1 characters
password - string
Password for your Microsoft SharePoint Online Account.
>= 1 characters
sessionExpirationMs - number
How long in milliseconds before new SharePoint online authentication cookies should be fetched.
>= 1
<= 172800000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 7200000
Multiple of: 1
userAgent - string
The user agent header decorates the http traffic. This is important for preventing hard rate limiting by SharePoint Online.
Default: ISV|Lucidworks|Fusion/5.x
capUserAgent - string
When "O365 Conditional Access Policy (CAP) setting" is enabled, we need to use a compliant User-Agent that matches one of the supported devices when doing O365 STS authentication. For example if iOS is a supported platform, set this to 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) CriOS/60.0.3112.89 Mobile/14G60 Safari/602.1'
<= 4000 characters
>= 1 characters
appAuthClientId - string
Applicable to SharePoint Online App-Auth Public/Private Service Account. The Azure client ID of your application.
<= 100 characters
>= 1 characters
appAuthPkcs12KeystoreBase64String - string
Applicable to SharePoint Online App-Auth only. This is the base64 string of your PKCS12 keystore loaded with the PFX certificate file supplied by Azure AD. To get this value, first take the Azure AD yourcert.pfx you recieved from Azure and convert to PKCS12 keystore format (example "keytool -importkeystore -srckeystore yourcert.pfx -srcstoretype pkcs12 -destkeystore yourcert.p12 -deststoretype pkcs12"). Next convert yourcert.p12 to base64 string.
<= 10000 characters
>= 1 characters
appAuthPkcs12KeystorePassword - string
Applicable to SharePoint Online App-Auth Public/Private Service Account. Password of the PKCS12 keystore.
<= 100 characters
>= 1 characters
appAuthClientSecret - string
Applicable to SharePoint Online OAuth App-Auth only. The Azure client ID of your application.
<= 100 characters
>= 1 characters
appAuthRefreshToken - string
Applicable to SharePoint Online OAuth App-Auth only. This is a refresh token which is reusable for up to 12 hours. You must obtain a new tokenusing the OAuth login process if the token becomes expired.
<= 1000 characters
>= 1 characters
appAuthTenant - string
Applicable to SharePoint Online App-Auth only. The Office365 tenant name to use when authenticating with Azure AD.
<= 2083 characters
>= 1 characters
appAuthAzureLoginEndpoint - string
Applicable to SharePoint Online App-Auth Public/Private Service Account. The Azure login endpoint to use when authenticating.
<= 2083 characters
>= 1 characters
Default: https://login.windows.net
jsAuthConfigJson - string
JS Auth config json file contains a list of WebCredential to do a web driver login process.
jsAuthLoginUrl - string
JS Auth Login Url to use when doing the login process.
jsAuthSeleniumUrl - string
URL of the Selenium grid service to use while obtaining performing WebDriver auth to sharepoint online.
maximumItemLimitConfig - Item Count Limit
maxItems - number
Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
sizeLimitProperties -