S3H Connector and Datasource Configuration
The S3H connector can access AWS S3 buckets that are stored as blocks, as they are in HDFS ("Hadoop over Amazon"). More information on the S3 Block approach is available from the Wiki for Amazon S3 Support in Apache Hadoop.
The connector uses the S3 API to request data from S3. It calls the
listBucket service, which lists all buckets owned by the user account supplied to the connector.
When creating an S3 datasource with the UI, Fusion automatically verifies that the user information supplied has access to the bucket defined in the URL property. If the bucket is not in the list returned by S3, datasource creation may fail. At crawl time, if the bucket is not in the list returned by S3, the crawl will fail.
Permission errors when trying to create or crawl the datasource may be caused by incorrect username or password, or they may be due to user account permissions. The user account must have List Bucket permissions for the account which owns the bucket that the crawler is trying to access.