AEM V2 Connector Configuration Reference
This connector retrieves data from an Adobe Experience Manager (AEM) repository. The AEM V2 connector is compatible with AEM version 6.5.
For step-by-step configuration instructions, refer to configure AEM V2 connector.
The AEM V2 connector supports the following:
-
Full crawling and recrawling of pages and assets in Adobe Experience Manager.
-
Basic authentication.
-
OAuth authenticaion.
-
Security trimming, to filter results based on user permissions.
-
Filter document crawling by including and excluding paths and configuring content properties when setting up the connector.
-
Specify wait time between fetch requests to throttle crawls, if necessary.
An Apache Sling based connector for AEM
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - string
The Parser to use in the associated IndexPipeline.
Match pattern: ^[a-zA-Z0-9_-]+$
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 0
Multiple of: 1
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection.
Default: false
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection.
Default: false
asyncParsing - boolean
When enabled, content will be indexed asynchronously.
Default: false
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - Properties
Plugin specific properties.
aemBaseUrl - string
Base URL to AEM, e.g. http://localhost:4502
>= 1 characters
Default: http://localhost:4502
username - string
Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed
password - string
Password to use for authentication.
allowAllCertificates - boolean
If false, security checks will be performed on all SSL/TLS certificate signers and origins. This means self-signed certificates would not be supported.
Default: false
pageSize - number
Number of documents to fetch per page request. A higher value can make crawling faster, but memory usage is also increased.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 100
Multiple of: 1
nodeDepth - number
Number of levels you want the query to return
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10
Multiple of: 1
threadWait - number
Time to wait, in milliseconds, between each page request
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
paths - array[string]
AEM paths that will be searched across for content.
Default: "/"
excludePathRegexes - array[string]
Java regular expressions for paths that should not be fetched
aemTypes - array[string]
AEM document type (jcr:primaryType) to include in the index. e.g. cq:Page, dam:Asset
Default: "cq:Page"
attachmentTypes - array[string]
Attachment extensions to index. By default all attachments are indexed.
maxSizeBytes - number
Maximum size, in bytes, of a document to fetch. If content is larger it will be trimmed to 'maxSizeBytes' size.
>= -9223372036854776000
<= 9223372036854776000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4194304
Multiple of: 1
requestProperties - Request Options
A set of options for configuring requests to AEM instance.
connectTimeout - number
The timeout in milliseconds until a connection is established.
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
socketTimeout - number
The socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
requestTimeout - number
The timeout in milliseconds used when requesting a connection from the connection manager.
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
retryProperties - Retry Options
A set of options for configuring requests retry behavior.
maxRetries - number
If request to AEM fails it will be retried this amount of times
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 3
Multiple of: 1
retryDelay - number
Time to wait, in milliseconds, between each retry
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
security - Security filtering configuration
enabled - boolean
Enable query-time security-trimming
Default: true
collectionId - string
Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.