Looking for the old docs site? You can still view it for a limited time here.

HDFS Connector Configuration Reference

Hadoop Distributed File System (HDFS). It traverses the Hadoop file system as it would a regular Unix filesystem.

See also the Hadoop connectors, which use MapReduce to distribute the crawl processes: * Apache Hadoop 2 * Cloudera * Hortonworks * MapR

When there is a lot of content to process, these MapReduce-enabled connectors will be significantly faster.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.