Product Selector

Fusion 5.12
    Fusion 5.12

    Windows Share V1 Connector Configuration Reference

    Table of Contents

    The Windows Share connector can access content in a Windows Share or Server Message Block (SMB)/Common Internet File System (CIFS) filesystem.

    Deprecation and removal notice

    This connector is deprecated as of Fusion 4.0 and is removed or expected to be removed as of Fusion 5.0. Use the Windows Share SMB2/3 V1 or Windows Share SMB2/3 V2 connector instead.

    For more information about deprecations and removals, including possible alternatives, see Deprecations and Removals.

    Configuration

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    Connector for Windows Shares (either CIFS or SMB) file systems. The Windows Share connector has been deprecated and may be removed in a future release. This connector will be superseded by the SMB2/3 connector.

    description - string

    Optional description for this datasource.

    id - stringrequired

    Unique name for this datasource.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    parserId - string

    Parser used when parsing raw content. Retry parsing setting is available under crawl performance (advance setting)

    pipeline - stringrequired

    Name of an existing index pipeline for processing documents.

    >= 1 characters

    properties - Properties

    Datasource configuration properties

    add_failed_docs - boolean

    Set to true to add documents even if they partially fail processing. Failed documents will be added with as much metadata as available, but may not include all expected fields.

    Default: false

    bounds - string

    Limits the crawl to a specific directory sub-tree, hostname or domain.

    Default: tree

    Allowed values: treehostdomainnone

    commit_on_finish - boolean

    Set to true for a request to be sent to Solr after the last batch has been fetched to commit the documents to the index.

    Default: true

    crawl_depth - integer

    Number of levels in a directory or site tree to descend for documents.

    >= -1

    exclusiveMinimum: false

    Default: -1

    crawl_item_timeout - integer

    Time in milliseconds to fetch any individual document.

    exclusiveMinimum: true

    Default: 600000

    db - Connector DB

    Type and properties for a ConnectorDB implementation to use with this datasource.

    aliases - boolean

    Keep track of original URI-s that resolved to the current URI. This negatively impacts performance and size of DB.

    Default: false

    inlinks - boolean

    Keep track of incoming links. This negatively impacts performance and size of DB.

    Default: false

    inv_aliases - boolean

    Keep track of target URI-s that the current URI resolves to. This negatively impacts performance and size of DB.

    Default: false

    type - string

    Fully qualified class name of ConnectorDb implementation.

    >= 1 characters

    Default: com.lucidworks.connectors.db.impl.MapDbConnectorDb

    enable_security_trimming - Enable security trimming

    Set to true to fetch and index access control information from files.

    ad_cache_groups - boolean

    Set to true to enable caching of the Active Directory group hierarchy to speed up security trimming processes.

    Default: false

    ad_connect_timeout - integer

    Time in milliseconds for connecting to Active Directory. Default is 3000 ms.

    exclusiveMinimum: false

    Default: 3000

    ad_context_factory - string

    The classname of the context factory to use to create the initial context.

    Default: com.sun.jndi.ldap.LdapCtxFactory

    ad_credentials - string

    A password for the User Principal.

    ad_group_base_dn - string

    Active Directory node or directory where group objects reside.

    ad_group_filter - string

    Valid LDAP filter to find group objects in Active Directory, such as '(&(objectclass=group))'.

    ad_read_timeout - integer

    Time in milliseconds for reading responses from Active Directory. Default is 5000 ms.

    exclusiveMinimum: false

    Default: 5000

    ad_read_token_groups - boolean

    Set to true to read groups using TOKEN_GROUPS. Only applied if 'Cache AD groups' is disabled.

    Default: true

    ad_referral - string

    The method for processing referrals encountered by the service provider.

    Default: follow

    ad_security_authentication - string

    The type of security authentication to use.

    Default: simple

    ad_url - string

    Fully qualified URL of the LDAP or AD server where user information is stored, in the format: 'ldap:://hostname:389' or 'ldap://hostname:636'.

    Match pattern: ldaps?://.+

    ad_user_base_dn - string

    Active Directory node or directory where user objects reside.

    ad_user_filter - string

    Valid LDAP filter to find user objects in Active Directory, such as '(&(objectclass=user)(sAMAccountName={0}))'.

    ad_user_principal_name - string

    A User Principal with permissions to access the Active Directory server in the format user@domain.

    Match pattern: .+@.+

    cache_element_expiration_time - integer

    Time in seconds to store items in the caches. The default is 7200 seconds (2 hours).

    exclusiveMinimum: false

    Default: 7200

    enable_SIDs_cache - boolean

    Set to true to cache user SIDs to reduce the number of Active Directory requests.

    Default: true

    max_cache_size - integer

    Maximum number of items to store in the SIDS cache before refreshing. The default is 100 items.

    exclusiveMinimum: false

    Default: 1000

    exclude_paths - array[string]

    Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.

    include_extensions - array[string]

    List the file extensions to be fetched. Note: Files with possible matching MIME types but non-matching file extensions will be skipped. Extensions should be listed without periods, using whitespace to separate items (e.g., 'pdf zip').

    include_paths - array[string]

    Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.

    index_directories - boolean

    Set to true to add directories to the index as documents. If set to false, directories will not be added to the index, but they will still be traversed for documents.

    Default: false

    initial_mapping - Initial field mapping

    Provides mapping of fields before documents are sent to an index pipeline.

    condition - string

    Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.

    label - string

    A unique label for this stage.

    <= 255 characters

    mappings - array[object]

    List of mapping rules

    Default: {"operation":"move","source":"fetch_time","target":"fetch_time_dt"}{"operation":"move","source":"ds:description","target":"description"}

    object attributes:{operation : {
     display name: Operation
     type: string
    }
    source required : {
     display name: Source Field
     type: string
    }
    target : {
     display name: Target Field
     type: string
    }
    }

    reservedFieldsMappingAllowed - boolean

    Default: false

    skip - boolean

    Set to true to skip this stage.

    Default: false

    unmapped - Unmapped Fields

    If fields do not match any of the field mapping rules, these rules will apply.

    operation - string

    The type of mapping to perform: move, copy, delete, add, set, or keep.

    Default: copy

    Allowed values: copymovedeletesetaddkeep

    source - string

    The name of the field to be mapped.

    target - string

    The name of the field to be mapped to.

    kerberos_keytab - string

    Full path to the Kerberos keytab file.

    kerberos_user - string

    Kerberos principal name, i.e., 'username@YOUR-REALM.COM'.

    max_bytes - integer

    Maximum size (in bytes) of documents to fetch or -1 for unlimited file size.

    >= -1

    exclusiveMinimum: false

    Default: 10485760

    max_docs - integer

    Maximum number of documents to fetch. The default (-1) means no limit.

    >= -1

    exclusiveMinimum: false

    Default: -1

    max_threads - integer

    The maximum number of threads to use for fetching data. Note: Each thread will create a new connection to the repository, which may make overall throughput faster, but this also requires more system resources, including CPU and memory.

    Default: 1

    maximum_connections - integer

    Maximum number of concurrent connections to the filesystem. A large number of documents could cause a large number of simultaneous connections to the repository and lead to errors or degraded performance. In some cases, reducing this number may help performance issues.

    Default: 10000

    password - string

    Password for the user. Do not set a password if Kerberos authentication is used.

    url - string

    The Windows share host followed by the window’s share name, e.g., smb://172.17.0.3:32778/share/Base_files-7f79d267-98b2-4175-b3ba-9274b21ce3a3/

    >= 1 characters

    Match pattern: .*:.*

    username - string

    A user in the Windows domain with READ permissions for the Windows Share. Do not set username if Kerberos authentication is used.

    verify_access - boolean

    Set to true to require successful connection to the filesystem before saving this datasource.

    Default: true

    windows_domain - string

    Authentication domain for the user. Not required if using Kerberos authentication.

    with_kerberos - boolean

    Select to use Kerberos for authentication instead of username and password. Kerberos authentication requires information for the advanced properties 'Kerberos principal' and 'Kerberos keytab'. If 'false', Username and Password must be entered.

    Default: false