Latest version: v1.5.0 Compatible with Fusion version: 5.9.0 and later
The AWS S3 V2 connector enables Fusion to crawl and index content stored in Amazon S3 buckets. Connector flow The AWS S3 V2 connector crawls items in a single bucket. You must specify the bucket name and AWS region in which that bucket is located. You may crawl specific items in a bucket. If no items are specified, the entire bucket will be crawled. This connector includes an option to Enable Stray Content Deletion. When stray content deletion is enabled, content that was removed from the data source is deleted from the index in Fusion. When stray content deletion is disabled, content that was removed from the datasource is not deleted from the index in Fusion. The connector can recursively crawl files and folders to retrieve content and metadata such as object size and the time it was last modified. You can also filter objects by file extension, object metadata, or by using regex.

Prerequisites

Perform these prerequisites to ensure the connector can reliably access, crawl, and index your data. Proper setup helps avoid configuration or permission errors, so use the following guidelines to keep your content available for discovery and search in Fusion.

Connector installation

The AWS S3 V2 connector in Fusion is named ‘S3 (v2)’ and is not preceded by ‘Amazon’ or ‘AWS.‘

AWS Permissions

The connector requires access to the following S3 operations:
  • s3:ListBucket lists objects in the specified bucket or that use a desired prefix.
  • s3:GetObject fetches the content and metadata of each object.
The following is an example IAM policy. When you set permissions, replace BUCKET_NAME with the value used in your implementation.
"Statement": [
         {
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME"
            ],
            "Effect": "Allow"
        }
]

Remote mode (optional)

To run the connector remotely, you need the following:
  • A Fusion user with the remote-connectors or admin role for gRPC authentication.
  • The connector-plugin-standalone.jar alongside the plugin ZIP on the remote host.
  • A configured connector backend gRPC endpoint (hostname:port) in your YAML.
  • If the remote host doesn’t trust Fusion’s TLS cert, point to a truststore file path in your config.

Authentication

Setting up the correct authentication according to your organization’s data governance policies helps keep sensitive data secure while allowing authorized indexing. The AWS S3 V2 connector supports multiple authentication methods to access your Amazon S3 bucket. Choose one of the following based on your environment and security model:
  • Basic authentication using an access key and secret access key
  • AWS session authentication using temporary credentials provided by AWS Security Token Service (STS)
  • AWS instance credentials for role-based authentication if Fusion is running inside AWS

Basic authentication

Enable AWS Basic Authentication Settings and enter your AWS access key and AWS secret key.

Session authentication.

Session authentication uses temporary security credentials obtained from AWS STS. Enable AWS Basic Authentication Settings and enter your AWS access key, AWS secret key, and session token. These credentials must be unexpired at runtime.

IAM Role for Fusion running in AWS

If Fusion or the remote connector is running on an EC2 instance or ECS task with an attached IAM role, do not enter credentials in the connector configuration as the connector will automatically use the role assigned to the host. Enable AWS Instance Credentials Authentication Settings and Use Instance Credentials. Make sure the IAM role has permissions to read objects from the S3 bucket and access any required prefixes or object paths.

Retry logic

The retryCount field sets the number of times the S3 client connection should retry when a document fails to index. Issues with AWS connectivity might result in the S3 connector being unable to crawl all of the data. The default for this field is retrying three times. If you are having trouble with AWS connectivity, try setting this field to a higher value, for example, 10 retries.

Remote connectors

V2 connectors support running remotely in Fusion versions 5.7.1 and later.
Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:
additionalVolumes:
- name: fusion-data1-pvc
    persistentVolumeClaim:
    claimName: fusion-data1-pvc
- name: fusion-data2-pvc
    persistentVolumeClaim:
    claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
    mountPath: "/connector/data1"
- name: fusion-data2-pvc
    mountPath: "/connector/data2"
You may also need to specify the user that is authorized to access the file system, as in this example:
securityContext:
    fsGroup: 1002100000
    runAsUser: 1002100000

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.