Product Selector

Fusion 5.12
    Fusion 5.12

    OneDrive Datasource V2Connector Configuration Reference

    Table of Contents

    OneDrive is a file hosting service that is part of the Microsoft Office Online services. The Fusion OneDrive connector crawls a OneDrive for Business instance and retrieves data from it for indexing within Fusion.

    To set up the OneDrive connector, first authenticate it with a new or existing Microsoft application. Then proceed to configuring the crawl.

    Remote connectors

    V2 connectors support running remotely in Fusion versions 5.7.1 and later. Refer to Configure Remote V2 Connectors.

    Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:

    additionalVolumes:
    - name: fusion-data1-pvc
        persistentVolumeClaim:
        claimName: fusion-data1-pvc
    - name: fusion-data2-pvc
        persistentVolumeClaim:
        claimName: fusion-data2-pvc
    additionalVolumeMounts:
    - name: fusion-data1-pvc
        mountPath: "/connector/data1"
    - name: fusion-data2-pvc
        mountPath: "/connector/data2"

    You may also need to specify the user that is authorized to access the file system, as in this example:

    securityContext:
        fsGroup: 1002100000
        runAsUser: 1002100000

    Connector for OneDrive file systems

    description - string

    Optional description

    <= 125 characters

    pipeline - stringrequired

    Name of the IndexPipeline used for processing output.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    diagnosticLogging - boolean

    Enable diagnostic logging; disabled by default

    Default: false

    parserId - stringrequired

    The Parser to use in the associated IndexPipeline.

    coreProperties - Core Properties

    Common behavior and performance settings.

    fetchSettings - Fetch Settings

    System level settings for controlling fetch behavior and performance.

    indexingInactivityTimeout - number

    The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 86400

    Multiple of: 1

    pluginInactivityTimeout - number

    The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 600

    Multiple of: 1

    indexMetadata - boolean

    When enabled the metadata of skipped items will be indexed to the content collection.

    Default: false

    indexContentFields - boolean

    When enabled, content fields will be indexed to the crawl-db collection.

    Default: false

    asyncParsing - boolean

    When enabled, content will be indexed asynchronously.

    Default: false

    numFetchThreads - number

    Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.

    >= 1

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 20

    Multiple of: 1

    indexingThreads - number

    Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.

    >= 1

    <= 10

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4

    Multiple of: 1

    pluginInstances - number

    Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 0

    Multiple of: 1

    fetchResponseScheduledTimeout - number

    The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.

    >= 1000

    <= 500000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    id - stringrequired

    A unique identifier for this Configuration.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    properties - OneDrive properties

    Plugin specific properties.

    clientId - string

    Client Id

    clientSecret - string

    Client secret

    tenantIdentifier - string

    Allowed values are common, organizations, consumers or identifiers (i.e. 8eaef023-2b34-4da1-9baa-8bc8c9d6a490 or example.onmicrosoft.com). For more details see: https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-v2-protocols#endpoints

    usersFilter - array[string]

    When this property is set, just the files and folders from the users listed here will be retrieved. This property accepts the user principal name(UPN) only.

    security - Security filtering configuration

    enabled - boolean

    Enable query-time security-trimming

    Default: true

    collectionId - string

    Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.

    maximumItemLimitConfig - Item Count Limits

    maxItems - number

    Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: -1

    Multiple of: 1

    namePatternConfig - Name Pattern Rules

    inclusiveRegexes - array[string]

    Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.

    Default:

    exclusiveRegexes - array[string]

    Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.

    Default:

    regexCacheSize - number

    The number of regex matches to cache when evaluating regular expressions. For example if you exclude files by filename, each filename's regex result will be cached so that if this same filename came up again, the regex matches would be remembered.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    excludedFileExtensions - array[string]

    A set of all file extensions to be skipped from the fetch.

    Default:

    includedFileExtensions - array[string]

    Set of file extensions to be fetched. If specified, all non-matching files will be skipped.

    Default:

    sizeLimitProperties - Item Size Limits

    Options for including or excluding items based on size, in bytes.

    maxSizeBytes - number

    Used for excluding items when the item size is larger than the configured value.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: -1

    Multiple of: 1

    minSizeBytes - number

    Used for excluding items when the item size is smaller than the configured value.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1

    Multiple of: 1

    fetchRetryProperties - Retry Options

    A set of options for configuring retry behavior.

    maxRetries - number

    The retryer will retry failed operations in the case that they might succeed if attempted again. This parameter states the number of attempts to retry until giving up. This parameter, if specified, will override the "Stop retrying after time (milliseconds)" parameter.

    <= 100

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 3

    Multiple of: 1

    delayFactor - number

    The retryer will retry failed operations in the case that they might succeed if attempted again. The retryer will sleep an exponential amount of time after the first failed attempt and retry in exponentially incrementing amounts after each failed attempt up to the maximumTime. nextWaitTime = exponentialIncrement * multiplier.

    >= 1

    <= 9999

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 2

    Multiple of: 1

    delayMs - number

    Sets the delay between retries, exponentially backing off to the maxDelayTimeMs and multiplying successive delays by the delayFactor

    >= 1

    <= 9223372036854776000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    maxDelayTimeMs - number

    The maximum time wait time between successive retries.

    >= 1

    <= 600000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    maxTimeLimitMs - number

    This setting is used to limit the maximum amount of time spent on retries. Note: this will be ignored if "Maximum Retries" is specified.

    >= 1

    <= 28800000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 600000

    Multiple of: 1

    errorExclusions - array[string]

    Optional regex list that will be matched against failed attempts exception class and message. If any regex matches, do not retry this request. This is needed to prevent the retryer from retrying non-recoverable errors that were not already ignored by the connector implementation.