Product Selector

Fusion 5.9
    Fusion 5.9

    AEM V2Connector Configuration Reference

    Table of Contents

    This connector retrieves data from an Adobe Experience Manager (AEM) repository. The AEM V2 connector is compatible with AEM version 6.5.

    For step-by-step configuration instructions, refer to configure AEM V2 connector.

    The AEM V2 connector supports the following:

    • Full crawling and recrawling of pages and assets in Adobe Experience Manager.

    • Basic authentication.

      In v1.2.0 and Fusion 5.12.0 and later, the username and password fields have moved under Authentication Settings > Login Settings.
    • OAuth authenticaion.

    • Security trimming, to filter results based on user permissions.

    • Filter document crawling by including and excluding paths and configuring content properties when setting up the connector.

    • Specify wait time between fetch requests to throttle crawls, if necessary.

    • In Fusion 5.12.0 and later: Optional crawling of child paths.

      Check for duplicate data when crawling child paths. For example, if the connector indexes both cq:Page and cq:PageContent then the results could include duplicated data.
    The v1.3.0 version of this connector is only compatible with Fusion 5.9.4 and later when using security trimming. The v1.3.0 connector version uses Graph Security Trimming and not regular security trimming. It is imperative to treat this as a new connector, as configurations do not transfer over due to disparities between newer versions and previous ones. A full crawl is mandatory.

    Remote connectors

    V2 connectors support running remotely in Fusion versions 5.7.1 and later. Refer to Configure Remote V2 Connectors.

    Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:

    additionalVolumes:
    - name: fusion-data1-pvc
        persistentVolumeClaim:
        claimName: fusion-data1-pvc
    - name: fusion-data2-pvc
        persistentVolumeClaim:
        claimName: fusion-data2-pvc
    additionalVolumeMounts:
    - name: fusion-data1-pvc
        mountPath: "/connector/data1"
    - name: fusion-data2-pvc
        mountPath: "/connector/data2"

    You may also need to specify the user that is authorized to access the file system, as in this example:

    securityContext:
        fsGroup: 1002100000
        runAsUser: 1002100000

    An Apache Sling based connector for AEM

    description - string

    Optional description

    <= 125 characters

    pipeline - stringrequired

    Name of the IndexPipeline used for processing output.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    diagnosticLogging - boolean

    Enable diagnostic logging; disabled by default

    Default: false

    parserId - string

    The Parser to use in the associated IndexPipeline.

    Match pattern: ^[a-zA-Z0-9_-]+$

    coreProperties - Core Properties

    Common behavior and performance settings.

    fetchSettings - Fetch Settings

    System level settings for controlling fetch behavior and performance.

    indexingThreads - number

    Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.

    >= 1

    <= 10

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4

    Multiple of: 1

    pluginInstances - number

    Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 0

    Multiple of: 1

    fetchResponseScheduledTimeout - number

    The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.

    >= 1000

    <= 500000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    indexingInactivityTimeout - number

    The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 86400

    Multiple of: 1

    numFetchThreads - number

    Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.

    >= 1

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 20

    Multiple of: 1

    pluginInactivityTimeout - number

    The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 600

    Multiple of: 1

    indexMetadata - boolean

    When enabled the metadata of skipped items will be indexed to the content collection.

    Default: false

    indexContentFields - boolean

    When enabled, content fields will be indexed to the crawl-db collection.

    Default: false

    asyncParsing - boolean

    When enabled, content will be indexed asynchronously.

    Default: false

    id - stringrequired

    A unique identifier for this Configuration.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    properties - Properties

    Plugin specific properties.

    aemBaseUrl - string

    Base URL to AEM, e.g. http://localhost:4502

    >= 1 characters

    Default: http://localhost:4502

    username - string

    Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed

    password - string

    Password to use for authentication.

    authConfig - Authentication Settings

    Select only one option

    loginAuthentication - Login Settings

    username - string

    Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed

    password - string

    Password to use for authentication.

    oAuth - OAuth Settings

    accessToken - string

    Access Token

    oAuthRefreshToken - string

    Refresh Token will be used to refresh Access Token

    jwtToken - string

    JWT Token will be used to request new Access Token if Refresh Token is not set

    clientId - string

    Client Id

    clientSecret - string

    Client Secret

    redirectUri - string

    Redirect Uri

    allowAllCertificates - boolean

    If false, security checks will be performed on all SSL/TLS certificate signers and origins. This means self-signed certificates would not be supported.

    Default: false

    pageSize - number

    Number of documents to fetch per page request. A higher value can make crawling faster, but memory usage is also increased.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 100

    Multiple of: 1

    nodeDepth - number

    Number of levels you want the query to return

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10

    Multiple of: 1

    indexChildPathData - boolean

    The metadata associated with subdirectories of the item's path will be indexed.

    Default: false

    threadWait - number

    Time to wait, in milliseconds, between each page request

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    paths - array[string]

    AEM paths that will be searched across for content.

    Default: "/"

    excludePathRegexes - array[string]

    Java regular expressions for paths that should not be fetched

    aemTypes - array[string]

    AEM document type (jcr:primaryType) to include in the index. e.g. cq:Page, dam:Asset

    Default: "cq:Page"

    attachmentTypes - array[string]

    Attachment extensions to index. By default all attachments are indexed.

    maxSizeBytes - number

    Maximum size, in bytes, of a document to fetch. If content is larger it will be trimmed to 'maxSizeBytes' size.

    >= -9223372036854776000

    <= 9223372036854776000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4194304

    Multiple of: 1

    requestProperties - Request Options

    A set of options for configuring requests to AEM instance.

    connectTimeout - number

    The timeout in milliseconds until a connection is established. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    socketTimeout - number

    The socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets). A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    requestTimeout - number

    The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10000

    Multiple of: 1

    retryProperties - Retry Options

    A set of options for configuring requests retry behavior.

    maxRetries - number

    If request to AEM fails it will be retried this amount of times

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 3

    Multiple of: 1

    retryDelay - number

    Time to wait, in milliseconds, between each retry

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    security - Graph security filtering configuration

    enabled - boolean

    Enable query-time security-trimming

    Default: true