Product Selector

Fusion 5.12
    Fusion 5.12

    SharePoint V2Connector Configuration Reference

    The SharePoint connector retrieves content and metadata from an on-premises SharePoint repository.

    Deprecation and removal notice

    This connector is deprecated as of June 19, 2023 and is removed or expected to be removed as of January 31, 2024. The SharePoint V2 connector is not compatible with Fusion 5.6 and later, regardless of the removal date. Use the SharePoint Optimized V2 connector instead.

    For more information about deprecations and removals, including possible alternatives, see Deprecations and Removals.

    This connector supports the following SharePoint server versions:

    • Microsoft SharePoint 2013

    • Microsoft SharePoint 2016

    • Microsoft SharePoint 2019

    • Microsoft SharePoint Online

    Configuration

    This section specifies the configuration properties for the SharePoint V2 connector.

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    Web applications

    At least one web application must be defined in the configuration, which represents the SharePoint web application to crawl.
    Property Description

    Web Application name

    Unique name of the web application in the specific configuration. Required field. Type: string. For example, webApp1.

    Web Application URL

    URL of the web application. Required field. For example, https://myWebApplication1.

    Site Collection List

    List of site collection paths. For example, if the site collection URL is https://webApplication/sites/MySiteCollection, the site collection path is /sites/MySiteCollection (which is the last portion of the URL). Multiple paths can be entered.

    SharePoint List or libraries in the site collection

    A set of list or library names within the site collection to crawl. For example, Documents.

    SharePoint webs

    List of web names to crawl within the parent site collecton.

    SharePoint List or library name

    Name of a list or library under the SharePoint web context. For example, Documents.

    SharePoint Folders

    Folders within the list to crawl.

    Excluded Site Collections

    List of site collections to exclude from the crawl.

    Exclusions can improve performance.

    Included file extensions

    Attachments with a file extension from this list are included (and indexed) when filtering occurs. For example, .txt.

    Attachments are the only object types with file extensions.

    Excluded file extensions

    Attachments with a file extension from this list are excluded (and discarded) when filtering occurs. For example, .txt.

    Inclusive regexes

    Regular expressions (regex) defined to index SharePoint objects including sites, lists, items, and attachments. The SharePoint object URL is used to match the regular expressions.

    Exclusive regexes

    Regular expressions (regex) defined to discard SharePoint objects including sites, lists, items, and attachments. The SharePoint object URL is used to match the regular expressions.

    Authentication

    Select only one authentication method for the configuration.

    Windows NT LAN Manager (NTLM) authentication

    Property Description

    User

    User name of the authenticating account

    Password

    Password of the authenticating account

    Domain

    Domain in which the client workstation has membership

    Workstation

    Client workstation name

    Forms-based authentication (FBA)

    Property Description

    Username

    User name created in the membership database

    Password

    Password of the user name created in the membership database

    SharePoint online authentication

    Property Description

    SharePoint online account

    Valid SharePoint account

    Password

    SharePoint online account password

    Microsoft login URL

    URL of the Microsoft login server

    App-only authentication (OAuth)

    Property Description

    Azure AD (Active Directory) client ID

    Azure client ID of the application

    Azure AD tenant

    Office365 tenant name

    Azure AD client secret

    Azure client secret of the client ID

    Azure AD login endpoint

    Login URL for authentication

    App-only authentication (OAuth) with private key

    Property Description

    Azure AD (Active Directory) client ID

    Azure client ID of the application

    Azure AD tenant

    Office365 tenant name

    Azure AD login endpoint

    Login URL for authentication

    Azure AD PKCS12 key

    The base64 string of the PKCS12 keystore loaded with the PFX (personal exchange format) certificate file supplied by Azure AD

    Azure AD PKCS12 keystore password

    Password of the Azure AD PKCS12 keystore

    Requirements to index all site collections

    The following conditions must be met to index site collections:

    • The authentication method must be one of the following:

      • Windows NT LAN Manager (NTLM)

      • SharePoint online

      • App-only (OAuth)

    • Credentials must list all site collections. For:

      • NTLM. Credentials must be an administrative account in the configuration.

      • SharePoint online. Credentials must be a SharePoint admin account in the configuration, not a site collection admin account.

      • App-only (OAuth). The application registered in the SharePoint instance must have a tenant scope.

    Crawl searchable content

    For detailed information about enabling and crawling searchable content, see Enable content on a site to be searchable.

    Limit documents

    These properties limit the documents and how they are processed.

    Property Description

    Fetch lists

    If enabled:

    • Fetches and indexes lists included in site collection.

    • Discards lists and associated items not included in site collection.

    Fetch list items

    If enabled, retrieves and indexes list items.

    Fetch attachments

    If enabled, retrieves and indexes item attachments.

    Index sites

    If enabled, indexes sites.

    This option does not affect the list or subsites retrieval.

    Index lists

    If enabled, indexes lists.

    This option does not affect the list item retrieval.

    Index empty lists

    • If enabled, indexes lists with no items (empty lists).

    • If disabled, discards empty lists.

    Index folders

    • If enabled, indexes folder items.

    • If disabled, discards folder items.

    Index taxonomy terms

    (Experimental)

    If enabled, indexes taxonomy terms from the default term store and places those terms in the content collection.

    Index Document Metadata

    Indexes metadata for files and attachments that do not meet maximum or minimum size limits.

    Does not index the content of the documents.

    Included List Base Types

    If the Fetch Lists property is set to true and base type is:

    • Specified, fetches only SharePoint lists with that base type.

    • Not specifed, fetches all Sharepoint lists.

    Base list types are Document Library, Generic List, Issue, and Survey.

    Request settings

    Property Description

    API query row limit

    Number of items to retrieve per page. Default value is 500. The connector paginates requests to retrieve list items.

    Changes API query row limit

    Number of events to retrieve per page. Default value is 200. The connector paginates requests to retrieve changes per site collection.

    User agent

    Value of the http header User-Agent for each request. Default value is ISV|Lucidworks|Fusion/1.0.

    Security trimming configuration

    Property Description

    Enable security trimming

    If enabled, the connector indexes SharePoint groups and the role assignments of each object type. Object types are sites, lists, items, and attachments.

    ACL collection name

    Access Control List (ACL) collection name. Role assignments and SharePoint groups are indexed in this collection.

    Security filtering

    Security filtering in the SharePoint connector requires the ACL (LDAP) connector to function correctly.

    For content collection, the SharePoint connector indexes documents. The value in the acl_ss field in each document contains roleAssignment IDs, where the role assignments define each object.

    For the access control collection, the SharePoint connector indexes:

    • SharePoint groups that contain Active Directory (AD) users and groups

    • Role assignment

      The LDAP ACL connector indexes the AD users and AD groups to the same access control collection.

    Common properties

    Proxy options

    Property Description

    Proxy URL

    URL of proxy server

    Proxy username

    User name to log in to the proxy server

    Proxy password

    Password of the proxy username

    Item count limit

    Property Description

    Maximum output limit

    Maximum number of indexed documents. Default value is -1, which specifies no maximum limit.

    Item size limit

    Property Description

    Maximum

    Maximum byte size of an attachment

    Minimum

    Minimum byte size of an attachment

    Item retry options

    Property Description

    Max retry attempts

    Maximum of attempts to retry if an item fails.

    Retry delay

    Number of seconds (delay) between retries if an item fails.

    Other retry options are deprecated.

    HTTP timeout options

    Property Description

    Read timeout

    Number of milliseconds before timeout occurs. Value is passed to the http client. Default value is 300 000 ms.

    Connection timeout

    Number of milliseconds before a connection attempt times out. Value is passed to the http client. Default value is 6 000 ms.

    HTTP connection options

    Property Description

    Maximum connections

    Maximum number of connections available in the pool. Default value is 1000.

    Maximum per route

    Maximum number of connections per route in the same target URL. Default value is 200.

    Ignore SSL (Secure Sockets Layer) validation exceptions

    If enabled, the http client does not fail if the server certificate cannot be validated. Default value is false.

    Test NTLM permissions to successfully crawl a site collection

    This is only applicable to Sharepoint on-premise deployments.

    To verify the NTLM account has appropriate permissions to crawl a site collection using the SharePoint V2 connector:

    1. Copy the check-ntlm-account-can-crawl-sharepoint-site-collection.ps1 PowerShell script below to a folder on your computer.

    $site_col_url="https://your.sharepoint-site.com/sites/mysitecol"
    
    $cred = (Get-Credential)
    
    if (-not ([System.Management.Automation.PSTypeName]'ServerCertificateValidationCallback').Type)
    {
    $certCallback = @"
        using System;
        using System.Net;
        using System.Net.Security;
        using System.Security.Cryptography.X509Certificates;
        public class ServerCertificateValidationCallback
        {
            public static void Ignore()
            {
                if(ServicePointManager.ServerCertificateValidationCallback ==null)
                {
                    ServicePointManager.ServerCertificateValidationCallback +=
                        delegate
                        (
                            Object obj,
                            X509Certificate certificate,
                            X509Chain chain,
                            SslPolicyErrors errors
                        )
                        {
                            return true;
                        };
                }
            }
        }
    "@
        Add-Type $certCallback
     }
    
    [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12;
    [ServerCertificateValidationCallback]::Ignore()
    
    $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Content-Type", "text/xml")
    $headers.Add("SOAPAction", "http://schemas.microsoft.com/sharepoint/soap/GetUpdatedFormDigestInformation")
    $headers.Add("X-RequestForceAuthentication", "true")
    $headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
    
    $body = "<?xml version=`"1.0`" encoding=`"utf-8`"?>`n<soap:Envelope xmlns:xsi=`"http://www.w3.org/2001/XMLSchema-instance`" xmlns:xsd=`"http://www.w3.org/2001/XMLSchema`" xmlns:soap=`"http://schemas.xmlsoap.org/soap/envelope/`">`n  <soap:Body>`n    <GetUpdatedFormDigestInformation xmlns=`"http://schemas.microsoft.com/sharepoint/soap/`" />`n  </soap:Body>`n</soap:Envelope>"
    
    $response = Invoke-RestMethod "${site_col_url}/_vti_bin/sites.asmx" -Method 'POST' -Headers $headers -Body $body -Credential $cred
    
    $digest_value = $response.Envelope.Body.GetUpdatedFormDigestInformationResponse.FirstChild.DigestValue
    
    $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add("Content-Type", "text/xml")
    $headers.Add("X-RequestForceAuthentication", "true")
    $headers.Add("X-RequestDigest", $digest_value)
    $headers.Add("Accept", "application/json")
    $headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f")
    
    $body = @'
    <Request AddExpandoFieldTypeSuffix="true" SchemaVersion="14.0.0.0" LibraryVersion="16.0.0.0"
             ApplicationName=".NET Library" xmlns="http://schemas.microsoft.com/sharepoint/clientquery/2009">
        <Actions>
            <ObjectPath Id="2" ObjectPathId="1"/>
            <ObjectPath Id="4" ObjectPathId="3"/>
            <Query Id="5" ObjectPathId="3">
                <Query SelectAllProperties="false">
                    <Properties>
                        <Property Name="Webs" SelectAll="true">
                            <Query SelectAllProperties="false">
                                <Properties/>
                            </Query>
                        </Property>
                        <Property Name="Title" ScalarProperty="true"/>
                        <Property Name="ServerRelativeUrl" ScalarProperty="true"/>
                        <Property Name="RoleDefinitions" SelectAll="true">
                            <Query SelectAllProperties="false">
                                <Properties/>
                            </Query>
                        </Property>
                        <Property Name="RoleAssignments" SelectAll="true">
                            <Query SelectAllProperties="false">
                                <Properties/>
                            </Query>
                        </Property>
                        <Property Name="HasUniqueRoleAssignments" ScalarProperty="true"/>
                        <Property Name="Description" ScalarProperty="true"/>
                        <Property Name="Id" ScalarProperty="true"/>
                        <Property Name="LastItemModifiedDate" ScalarProperty="true"/>
                    </Properties>
                </Query>
            </Query>
        </Actions>
        <ObjectPaths>
            <StaticProperty Id="1" TypeId="{3747adcd-a3c3-41b9-bfab-4a64dd2f1e0a}" Name="Current"/>
            <Property Id="3" ParentId="1" Name="Web"/>
        </ObjectPaths>
    </Request>
    '@
    
    $response = Invoke-RestMethod "${site_col_url}/_vti_bin/client.svc/ProcessQuery" -Method 'POST' -Headers $headers -Body $body -Credential $cred
    $response | ConvertTo-Json -Depth 100
    1. Change the value in the first line: $site_col_url="https://your.sharepoint-site.com/sites/mysitecol" to the URL of your site collection.

    2. Execute the script. If the result is:

      • A JSON output of your site’s metadata, the account permissions are set correctly.

      • An error such as a 403, 401, or other error, the account permissions are not set correctly. Set permissions correctly and run the script again to verify it executes successfully.

    Connector for SharePoint

    description - string

    Optional description

    <= 125 characters

    pipeline - stringrequired

    Name of the IndexPipeline used for processing output.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    Default: lucidworks-sharepoint

    diagnosticLogging - boolean

    Enable diagnostic logging; disabled by default

    Default: false

    parserId - stringrequired

    The Parser to use in the associated IndexPipeline.

    Default: lucidworks-sharepoint

    coreProperties - Core Properties

    Common behavior and performance settings.

    fetchSettings - Fetch Settings

    System level settings for controlling fetch behavior and performance.

    numFetchThreads - number

    Maximum number of fetch threads; defaults to 5.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.

    >= 1

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 5

    Multiple of: 1

    indexingThreads - number

    Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.

    >= 1

    <= 10

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4

    Multiple of: 1

    pluginInstances - number

    Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 0

    Multiple of: 1

    fetchResponseScheduledTimeout - number

    The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.

    >= 1000

    <= 500000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    indexingInactivityTimeout - number

    The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 86400

    Multiple of: 1

    pluginInactivityTimeout - number

    The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 600

    Multiple of: 1

    indexMetadata - boolean

    When enabled the metadata of skipped items will be indexed to the content collection.

    Default: false

    indexContentFields - boolean

    When enabled, content fields will be indexed to the crawl-db collection.

    Default: false

    asyncParsing - boolean

    When enabled, content will be indexed asynchronously.

    Default: false

    id - stringrequired

    A unique identifier for this Configuration.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    properties - SharePoint properties

    Plugin specific properties.

    webApplicationSettings - Web application settings

    Web application settings to control site collections, web and lists to be included or excluded from the fetching process.

    webApplicationUrl - string

    The URL must have the format https://<hostname>[:port].

    >= 1 characters

    includeSiteCollections - array[object]

    A list of site collections with specific inner containers to crawl. Because only site collection administrators or site collection auditors can list the site collections in a SharePoint web application, use this when you are crawling as a user that is not an admin/auditor. This allows you to explicitly list site collections you want to crawl. The paths specified in this property must be relative to the web application url

    object attributes:{siteCollectionPath : {
     display name: Site collection path
     type: string
    }
    includeWebs : {
     display name: Include SharePoint Webs
     type: array
    }
    excludeWebs : {
     display name: Exclude SharePoint Webs
     type: array
    }
    includeLists : {
     display name: Include Sharepoint List or Libraries
     type: array
    }
    excludeLists : {
     display name: Exclude Sharepoint List or Libraries
     type: array
    }
    }

    excludeSiteCollections - array[string]

    A list of site collections paths to be excluded from the fetching process. The paths specified in this property must be relative to the web application url.

    Default:

    inclusiveSiteCollectionRegexes - array[string]

    A list of regexes that will be used to include matching site collection paths e.g "/sites/a.*" or "/sites/[a-mA-M].*". All site collections will be listed from the web application to carry out the filtering.

    Default:

    exclusiveSiteCollectionRegexes - array[string]

    A list of regexes that will be used to exclude matching site collection paths e.g "/sites/h.*", "/sites/[h-zH-Z].*", exact matches "/sites/name" or "/" to exclude default site collection. All site collections will be listed from the web application to carry out the filtering.

    Default:

    webApplications - array[object]

    Web applications to crawl. This property accepts a single web application only. This property was deprecated, use Web application settings instead.

    object attributes:{webApplicationName required : {
     display name: Web application Name
     type: string
    }
    webApplicationUrl required : {
     display name: Web application URL
     type: string
    }
    siteCollectionsWithChildren : {
     display name: Included Site Collections
     type: array
    }
    excludeSiteCollections : {
     display name: Excluded Site Collections
     type: array
    }
    includedFileExtensions : {
     display name: Included file extensions (Deprecated, use Limit documents > Limit by extension)
     type: array
    }
    excludedFileExtensions : {
     display name: Excluded file extensions (Deprecated, use Limit documents > Limit by extension)
     type: array
    }
    inclusiveRegexes : {
     display name: Inclusive regexes (Deprecated, use Limit documents > Limit by regular expression)
     type: array
    }
    exclusiveRegexes : {
     display name: Exclusive regexes (Deprecated, use Limit documents > Limit by regular expression)
     type: array
    }
    regexCacheSize : {
     display name:
     type: number
    }
    siteCollections : {
     display name: Deprecated: Site collection list
     type: array
    }
    }

    authenticationProperties - Authentication settings

    Select only one option

    ntlmProperties - NTLM Authentication settings

    Settings for on premise NTLM authorization

    user - string

    User

    >= 1 characters

    password - string

    Password

    >= 1 characters

    domain - string

    Domain

    >= 1 characters

    workstation - string

    Workstation

    >= 1 characters

    fbaProperties - Form based authentication settings

    Settings for on premise FBA authorization

    user - string

    User

    >= 1 characters

    password - string

    Password

    >= 1 characters

    spoidcrlProperties - Credential based Authentication (SharePoint online)

    Settings for SharePoint Online authentication based on account/password credentials - used for both Native and ADFS (Active Directory Federation Services) authorization.

    account - string

    Your Microsoft SharePoint Online Account name which takes the form of username@domain.com

    >= 1 characters

    password - string

    Password for your Microsoft SharePoint Online Account.

    >= 1 characters

    microsoftOnlineLoginServerUrl - string

    URL to microsoft login server.

    Default: https://login.microsoftonline.com

    appOnlyOauthProperties - App-Only Authentication (OAuth protocol)

    Settings for App-Only/Oauth authorization.

    appAuthClientId - string

    The Azure client ID of your application.

    <= 100 characters

    >= 1 characters

    tenant - string

    The Office365 tenant name to use when authenticating with Azure AD. E.g. exampleapp.onmicrosoft.com

    <= 2083 characters

    >= 1 characters

    clientSecret - string

    The Azure secret related to Your Client ID.

    <= 2083 characters

    >= 1 characters

    azureLoginEndpoint - string

    The Azure login endpoint to use when authenticating.

    <= 2083 characters

    >= 1 characters

    Default: https://login.windows.net

    appOnlyPrivateKeyProperties - App-Only Authentication with private key

    Settings for App-Only authorization with private key credentials.

    clientId - string

    The Azure client ID of your application.

    <= 100 characters

    >= 1 characters

    tenant - string

    The Office365 tenant name to use when authenticating with Azure AD. E.g. exampleapp.onmicrosoft.com

    <= 2083 characters

    >= 1 characters

    azureLoginEndpoint - string

    The Azure login endpoint to use when authenticating.

    <= 2083 characters

    >= 1 characters

    Default: https://login.windows.net

    pkcs12KeystoreBase64String - string

    This is the base64 string of your PKCS12 keystore loaded with the PFX certificate file supplied by Azure AD. To get this value, first take the Azure AD yourcert.pfx you recieved from Azure and convert to PKCS12 keystore format (example "keytool -importkeystore -srckeystore yourcert.pfx -srcstoretype pkcs12 -destkeystore yourcert.p12 -deststoretype pkcs12"). Next convert your cert.p12 to base64 string.

    <= 10000 characters

    >= 1 characters

    pkcs12KeystorePassword - string

    Password of the PKCS12 keystore.

    <= 100 characters

    >= 1 characters

    sessionExpirationMs - number

    How long (in milliseconds) before new authentication cookies should be fetched.

    >= 1

    <= 86400000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 180000

    Multiple of: 1

    noIndexEvaluation - boolean

    Crawl searchable content. See https://docs.microsoft.com/en-us/sharepoint/make-site-content-searchablefor more details

    Default: false

    proxyProperties - Proxy options

    A set of options for configuring the proxy.

    url - string

    The proxy URL

    >= 1 characters

    username - string

    Proxy username

    >= 1 characters

    password - string

    Proxy password

    >= 1 characters

    maximumItemLimitConfig - Item Count Limit

    maxItems - number

    Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: -1

    Multiple of: 1

    sizeLimitProperties - Item Size Limits

    Options for including or excluding items based on size, in bytes.

    maxSizeBytes - number

    Used for excluding items when the item size is larger than the configured value.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: -1

    Multiple of: 1

    minSizeBytes - number

    Used for excluding items when the item size is smaller than the configured value.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1

    Multiple of: 1

    retryProperties - Retry Options (This property is deprecated and should not be used

    A set of options for configuring retry behavior.

    maxDelayTimeMs - number

    The maximum time wait time between successive retries.

    >= 1

    <= 600000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 30000

    Multiple of: 1

    maxTimeLimitMs - number

    This setting is used to limit the maximum amount of time spent on retries. Note: this will be ignored if "Maximum Retries" is specified.

    >= 1

    <= 28800000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 60000

    Multiple of: 1

    errorExclusions - array[string]

    maxRetries - number

    The retryer will retry failed operations in the case that they might succeed if attempted again. This parameter states the number of attempts to retry until giving up. This parameter, if specified, will override the "Stop retrying after time (milliseconds)" parameter.

    <= 100

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 7

    Multiple of: 1

    delayFactor - number

    The retryer will retry failed operations in the case that they might succeed if attempted again. The retryer will sleep an exponential amount of time after the first failed attempt and retry in exponentially incrementing amounts after each failed attempt up to the maximumTime. nextWaitTime = exponentialIncrement * multiplier.

    >= 1

    <= 9999

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 2

    Multiple of: 1

    delayMs - number

    Sets the delay between retries, exponentially backing off to the maxDelayTimeMs and multiplying successive delays by the delayFactor

    >= 1

    <= 9223372036854776000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 100

    Multiple of: 1

    timeoutsProperties - Http timeout options

    A set of options for configuring the http client.

    readTimeoutMs - number

    <= 600000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    connectTimeoutMs - number

    <= 300000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 60000

    Multiple of: 1

    connectionsProperties - Http connection options

    maxConnections - number

    The maximum number of connections

    >= 1

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 20

    Multiple of: 1

    ignoreSSLValidationExceptions - boolean

    Do not attempt to do an SSL Handshake and do not verify the hostname of SSL certificates. Use this when accessing an https url with a self-signed or enterprise certificate authority that you do not want to put in the Java keystore.

    Default: false

    maxIdleConnectionsTime - number

    Maximum time in milliseconds of connections can stay idle while kept alive in the connection pool. Connections whose inactivity period exceeds this value will get closed and evicted from the pool.

    >= 1

    <= 9223372036854776000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 5000

    Multiple of: 1

    itemRetryProperties - Item retry settings

    Options to configure the retry operation for items.

    maxRetries - number

    The maximum number of attempts for a failed item

    <= 20

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 3

    Multiple of: 1

    retryDelayInSeconds - number

    The amount of time, in seconds, before process again a failed item

    >= 1

    <= 600

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 30

    Multiple of: 1

    limitDocuments - Limit Documents

    A set of options for configuring indexing of Documents.

    fetchLists - boolean

    Check this to fetch SharePoint lists during the crawl. Set this to false to ignore all lists and its list items.

    Default: true

    fetchListItems - boolean

    Check this to fetch SharePoint lists items during the crawl. Set this to false to ignore all list items and its children.

    Default: true

    fetchAttachments - boolean

    Set this to true if you want to fetch list item attachments, false otherwise.

    Default: true

    indexSites - boolean

    Set this to true if you want to index each SharePoint site as a document in the content collection. If set to false, the contents of sites will still be fetched, but there will not be a document in the content collection for each Site and its metadata.

    Default: true

    indexLists - boolean

    Set this to true if you want to index each SharePoint List as a document in the content collection. If set to false, the List content will still be indexed, but there will not be a document in the content collection for this List and its metadata.

    Default: true

    indexEmptyLists - boolean

    Set this to true if you want to index SharePoint Lists as a document in the content collection even when it has an List Item Count = 0.

    Default: true

    indexFolders - boolean

    Set this to true if you want to index each folder within SharePoint Lists as a document in the content collection. If set to false, the contents of folders will still be indexed, but there will not be a document in the content collection for this folder and its metadata.

    Default: true

    expandTaxonomyTerms - boolean

    When enabled, index the taxonomy term path of each list item with a taxonomy field. This feature doesn't support incremental crawls. If term labels change, it's necessary to re-index all documents to update the labels

    Default: false

    indexListItems - boolean

    Set this to true if you want to index each SharePoint List Item Containers. This type of list items only contain metadata (they are not represented by physical files), Attachments will be fetched as long as the property fetchAttachments is enabled. If set to false, SharePoint List Item Containers will be skipped. but Attachments will be fetched as long as the property fetchAttachments is enabled

    Default: true

    indexListItemDocuments - boolean

    Set this to true if you want to index each SharePoint List Item Documents. This type of list items are represented by files (like pdf files or word documents), they will be parsed and indexed along side with its metadata. If set to false, SharePoint List Item Documents will be skipped

    Default: true

    includeListBaseTypes - array[string]

    If specified, the only SharePoint Lists that will be fetched are the ones that match one of these base types; if none is specified, all types will be fetched.In order to have this property working, set property 'Fetch Lists' to true.

    limitByRegex - Limit documents by regular expressions

    inclusiveRegexes - array[string]

    Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.

    Default:

    exclusiveRegexes - array[string]

    Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.

    Default:

    limitByExtension - Limit documents by extension

    includedFileExtensions - array[string]

    Set of file extensions to be fetched. If specified, all non-matching files will be skipped.

    Default:

    excludedFileExtensions - array[string]

    A set of all file extensions to be skipped from the fetch.

    Default:

    documentMetadata - Document metadata

    A set of options for configuring indexing of Documents Metadata in case of items filtering

    indexByRegex - boolean

    Index the document's metadata for those files/attachments that do not meet the regexes limits. Content of the documents will not be indexed.

    Default: false

    indexByExtension - boolean

    Index the document's metadata for those files/attachments that do not meet the file extension limits. Content of the documents will not be indexed.

    Default: false

    indexBySize - boolean

    Index the document's metadata for those files/attachments that do not meet the maximum/minimum size limits. Content of the documents will not be indexed.

    Default: false

    requestsProperties - Requests settings

    A set of options for configuring the requests the connector will send to the SharePoint server instance

    apiQueryRowLimit - number

    >= 1

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 500

    Multiple of: 1

    changeApiQueryRowLimit - number

    >= 1

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 200

    Multiple of: 1

    userAgent - string

    The user agent header decorates the http traffic. This is important for preventing hard rate limiting by SharePointOnline. The user agent naming conventions are 'ISV|CompanyName|AppName/Version' and 'NONISV|CompanyName|AppName/Version', see here for more details about: https://docs.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-or-blocked-in-sharepoint-online#what-is-definition-of-undecorated-traffic

    Default: ISV|Lucidworks|Connector/2.0

    capUserAgent - string

    When "O365 Conditional Access Policy (CAP) setting" is enabled, it is needed to use a compliant User-Agent that matches one of the supported devices when doing O365 STS authentication. For example if iOS is a supported platform, set this to 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) CriOS/60.0.3112.89 Mobile/14G60 Safari/602.1'

    incrementalCrawlingProperties - Incremental crawling settings

    A set of options for configuring the incremental crawling behavior

    jobExecutionsForSiteCollections - number

    Maximum number of job executions before removing a deleted site collection

    >= 1

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 5

    Multiple of: 1

    security - Security filtering configuration

    enabled - boolean

    Enable query-time security-trimming

    Default: true

    collectionId - string

    Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.