AEM V2Connector Configuration Reference
This connector retrieves data from an Adobe Experience Manager (AEM) repository. The AEM V2 connector is compatible with AEM version 6.5.
The AEM V2 connector supports the following:
-
Full crawling and recrawling of pages and assets in Adobe Experience Manager.
-
Basic authentication.
|
In v1.2.0 and Fusion 5.12.0 and later, the username and password fields have moved under Authentication Settings > Login Settings.
|
-
OAuth authenticaion.
-
Security trimming, to filter results based on user permissions.
-
Filter document crawling by including and excluding paths and configuring content properties when setting up the connector.
-
Specify wait time between fetch requests to throttle crawls, if necessary.
-
In Fusion 5.12.0 and later: Optional crawling of child paths.
|
Check for duplicate data when crawling child paths. For example, if the connector indexes both cq:Page and cq:PageContent then the results could include duplicated data.
|
|
The v1.3.0 version of this connector is only compatible with Fusion 5.9.4 and later when using security trimming. The v1.3.0 connector version uses Graph Security Trimming and not regular security trimming. It is imperative to treat this as a new connector, as configurations do not transfer over due to disparities between newer versions and previous ones. A full crawl is mandatory.
|
Below is an example configuration showing how to specify the file system to index under the connector-plugins
entry in your values.yaml
file:
additionalVolumes:
- name: fusion-data1-pvc
persistentVolumeClaim:
claimName: fusion-data1-pvc
- name: fusion-data2-pvc
persistentVolumeClaim:
claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
mountPath: "/connector/data1"
- name: fusion-data2-pvc
mountPath: "/connector/data2"
You may also need to specify the user that is authorized to access the file system, as in this example:
securityContext:
fsGroup: 1002100000
runAsUser: 1002100000
An Apache Sling based connector for AEM
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - string
The Parser to use in the associated IndexPipeline.
Match pattern: ^[a-zA-Z0-9_-]+$
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 0
Multiple of: 1
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection.
Default: false
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection.
Default: false
asyncParsing - boolean
When enabled, content will be indexed asynchronously.
Default: false
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - Properties
Plugin specific properties.
aemBaseUrl - string
Base URL to AEM, e.g. http://localhost:4502
>= 1 characters
Default: http://localhost:4502
username - string
Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed
password - string
Password to use for authentication.
authConfig - Authentication Settings
Select only one option
loginAuthentication - Login Settings
username - string
Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed
password - string
Password to use for authentication.
oAuth - OAuth Settings
accessToken - string
Access Token
oAuthRefreshToken - string
Refresh Token will be used to refresh Access Token
jwtToken - string
JWT Token will be used to request new Access Token if Refresh Token is not set
clientId - string
Client Id
clientSecret - string
Client Secret
redirectUri - string
Redirect Uri
allowAllCertificates - boolean
If false, security checks will be performed on all SSL/TLS certificate signers and origins. This means self-signed certificates would not be supported.
Default: false
pageSize - number
Number of documents to fetch per page request. A higher value can make crawling faster, but memory usage is also increased.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 100
Multiple of: 1
nodeDepth - number
Number of levels you want the query to return
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10
Multiple of: 1
indexChildPathData - boolean
The metadata associated with subdirectories of the item's path will be indexed.
Default: false
threadWait - number
Time to wait, in milliseconds, between each page request
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
paths - array[string]
AEM paths that will be searched across for content.
Default: "/"
excludePathRegexes - array[string]
Java regular expressions for paths that should not be fetched
aemTypes - array[string]
AEM document type (jcr:primaryType) to include in the index. e.g. cq:Page, dam:Asset
Default: "cq:Page"
attachmentTypes - array[string]
Attachment extensions to index. By default all attachments are indexed.
maxSizeBytes - number
Maximum size, in bytes, of a document to fetch. If content is larger it will be trimmed to 'maxSizeBytes' size.
>= -9223372036854776000
<= 9223372036854776000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4194304
Multiple of: 1
requestProperties - Request Options
A set of options for configuring requests to AEM instance.
connectTimeout - number
The timeout in milliseconds until a connection is established.
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
socketTimeout - number
The socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
requestTimeout - number
The timeout in milliseconds used when requesting a connection from the connection manager.
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
retryProperties - Retry Options
A set of options for configuring requests retry behavior.
maxRetries - number
If request to AEM fails it will be retried this amount of times
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 3
Multiple of: 1
retryDelay - number
Time to wait, in milliseconds, between each retry
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
security - Graph security filtering configuration
enabled - boolean
Enable query-time security-trimming
Default: true