Product Selector

Fusion 5.12
    Fusion 5.12

    AEM V2 Connector Configuration Reference

    Table of Contents

    This connector retrieves data from an Adobe Experience Manager (AEM) repository. The AEM V2 connector is compatible with AEM version 6.5.

    For step-by-step configuration instructions, refer to configure AEM V2 connector.

    The AEM V2 connector supports the following:

    • Full crawling and recrawling of pages and assets in Adobe Experience Manager.

    • Basic authentication.

    • OAuth authenticaion.

    • Security trimming, to filter results based on user permissions.

    • Filter document crawling by including and excluding paths and configuring content properties when setting up the connector.

    • Specify wait time between fetch requests to throttle crawls, if necessary.

    An Apache Sling based connector for AEM

    description - string

    Optional description

    <= 125 characters

    pipeline - stringrequired

    Name of the IndexPipeline used for processing output.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    diagnosticLogging - boolean

    Enable diagnostic logging; disabled by default

    Default: false

    parserId - string

    The Parser to use in the associated IndexPipeline.

    Match pattern: ^[a-zA-Z0-9_-]+$

    coreProperties - Core Properties

    Common behavior and performance settings.

    fetchSettings - Fetch Settings

    System level settings for controlling fetch behavior and performance.

    numFetchThreads - number

    Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.

    >= 1

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 20

    Multiple of: 1

    indexingThreads - number

    Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.

    >= 1

    <= 10

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4

    Multiple of: 1

    pluginInstances - number

    Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 0

    Multiple of: 1

    fetchResponseScheduledTimeout - number

    The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.

    >= 1000

    <= 500000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    indexingInactivityTimeout - number

    The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 86400

    Multiple of: 1

    pluginInactivityTimeout - number

    The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 600

    Multiple of: 1

    indexMetadata - boolean

    When enabled the metadata of skipped items will be indexed to the content collection.

    Default: false

    indexContentFields - boolean

    When enabled, content fields will be indexed to the crawl-db collection.

    Default: false

    asyncParsing - boolean

    When enabled, content will be indexed asynchronously.

    Default: false

    id - stringrequired

    A unique identifier for this Configuration.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    properties - Properties

    Plugin specific properties.

    aemBaseUrl - string

    Base URL to AEM, e.g. http://localhost:4502

    >= 1 characters

    Default: http://localhost:4502

    username - string

    Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed

    password - string

    Password to use for authentication.

    allowAllCertificates - boolean

    If false, security checks will be performed on all SSL/TLS certificate signers and origins. This means self-signed certificates would not be supported.

    Default: false

    pageSize - number

    Number of documents to fetch per page request. A higher value can make crawling faster, but memory usage is also increased.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 100

    Multiple of: 1

    nodeDepth - number

    Number of levels you want the query to return

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10

    Multiple of: 1

    threadWait - number

    Time to wait, in milliseconds, between each page request

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    paths - array[string]

    AEM paths that will be searched across for content.

    Default: "/"

    excludePathRegexes - array[string]

    Java regular expressions for paths that should not be fetched

    aemTypes - array[string]

    AEM document type (jcr:primaryType) to include in the index. e.g. cq:Page, dam:Asset

    Default: "cq:Page"

    attachmentTypes - array[string]

    Attachment extensions to index. By default all attachments are indexed.

    maxSizeBytes - number

    Maximum size, in bytes, of a document to fetch. If content is larger it will be trimmed to 'maxSizeBytes' size.

    >= -9223372036854776000

    <= 9223372036854776000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4194304

    Multiple of: 1

    requestProperties - Request Options

    A set of options for configuring requests to AEM instance.

    connectTimeout - number

    The timeout in milliseconds until a connection is established. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    socketTimeout - number

    The socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets). A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    requestTimeout - number

    The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    retryProperties - Retry Options

    A set of options for configuring requests retry behavior.

    maxRetries - number

    If request to AEM fails it will be retried this amount of times

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 3

    Multiple of: 1

    retryDelay - number

    Time to wait, in milliseconds, between each retry

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    security - Security filtering configuration

    enabled - boolean

    Enable query-time security-trimming

    Default: true

    collectionId - string

    Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.