AWS S3 V1 Connector Configuration Reference
The AWS S3 V1 Connector can access AWS S3 buckets in native format.
|
Deprecation and removal notice
This connector is deprecated as of Fusion 5.2 and is removed or expected to be removed as of Fusion 5.5. Use the AWS S3 V2 connector instead.
|
The connector uses the S3 API to request data from S3. It calls the listBucket
service, which lists all buckets owned by the user account supplied to the connector.
When creating an S3 data source using the UI, Fusion automatically verifies that the user information supplied has access to the bucket defined in the URL property. If the bucket is not in the list returned by S3, data source creation may fail. At crawl time, if the bucket is not in the list returned by S3, the crawl fails.
Permission errors when trying to create or crawl the data source may be caused by incorrect username or password, or they may be due to user account permissions. The user account must have List Bucket permissions for the account which owns the bucket that the crawler is trying to access.
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
Connector to index content in AWS S3 buckets.
description - string
Optional description for this datasource.
id - stringrequired
Unique name for this datasource.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
parserId - string
Parser used when parsing raw content. Retry parsing setting is available under crawl performance (advance setting)
pipeline - stringrequired
Name of an existing index pipeline for processing documents.
>= 1 characters
properties - Properties
Datasource configuration properties
add_failed_docs - boolean
Set to true to add documents even if they partially fail processing. Failed documents will be added with as much metadata as available, but may not include all expected fields.
Default: false
aws_region - string
AWS Region bucket on GET requests with AWS Signature Version 4 enabled
Default: us-west-2
Allowed values: us-gov-west-1us-gov-east-1us-east-1us-east-2us-west-1us-west-2eu-west-1eu-west-2eu-west-3eu-central-1eu-north-1ap-east-1ap-south-1ap-southeast-1ap-southeast-2ap-northeast-1ap-northeast-2sa-east-1cn-north-1cn-northwest-1ca-central-1
bounds - string
Limits the crawl to a specific directory sub-tree, hostname or domain.
Default: tree
Allowed values: treehostdomainnone
commit_on_finish - boolean
Set to true for a request to be sent to Solr after the last batch has been fetched to commit the documents to the index.
Default: true
crawl_depth - integer
Number of levels in a directory or site tree to descend for documents.
>= -1
exclusiveMinimum: false
Default: -1
crawl_item_timeout - integer
Time in milliseconds to fetch any individual document.
exclusiveMinimum: true
Default: 600000
db - Connector DB
Type and properties for a ConnectorDB implementation to use with this datasource.
aliases - boolean
Keep track of original URI-s that resolved to the current URI. This negatively impacts performance and size of DB.
Default: false
inlinks - boolean
Keep track of incoming links. This negatively impacts performance and size of DB.
Default: false
inv_aliases - boolean
Keep track of target URI-s that the current URI resolves to. This negatively impacts performance and size of DB.
Default: false
type - string
Fully qualified class name of ConnectorDb implementation.
>= 1 characters
Default: com.lucidworks.connectors.db.impl.MapDbConnectorDb
exclude_paths - array[string]
Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.
include_extensions - array[string]
List the file extensions to be fetched. Note: Files with possible matching MIME types but non-matching file extensions will be skipped. Extensions should be listed without periods, using whitespace to separate items (e.g., 'pdf zip').
include_paths - array[string]
Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.
index_directories - boolean
Set to true to add directories to the index as documents. If set to false, directories will not be added to the index, but they will still be traversed for documents.
Default: false
initial_mapping - Initial field mapping
Provides mapping of fields before documents are sent to an index pipeline.
condition - string
Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.
label - string
A unique label for this stage.
<= 255 characters
mappings - array[object]
List of mapping rules
Default: {"operation":"move","source":"fetch_time","target":"fetch_time_dt"}{"operation":"move","source":"ds:description","target":"description"}
object attributes:{operation
: {