Local Filesystem Connector and Datasource Configuration

A filesystem-based data store is a network of nodes to be traversed, where each node (such as a Unix file directory) provides information about its child nodes (such as the files in that directory) or references other nodes (such as links in an HTML document).

The crawler captures information about the node, e.g., filename, permissions, date of creation, last modification, and last access, as well as the contents of the nodes. The extent of the network of nodes to be traversed is discovered during the crawl.

The connector provides rules to limit the crawl and re-crawling. These rules use datasource configuration properties to limit the extent of the network (depth of nodes to explore) as well as limiting processing to a subset of files based on file names and file size. An overall limit can be set on number of files retrieved during a crawl.