HDFS Connector and Datasource Configuration

Hadoop Distributed File System (HDFS). It traverses the Hadoop file system as it would a regular Unix filesystem.

See also the Hadoop connectors, which use MapReduce to distribute the crawl processes: * Apache Hadoop 2 * Cloudera * Hortonworks * MapR

When there is a lot of content to process, these MapReduce-enabled connectors will be significantly faster.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.