- Database Connectors
- Filesystem Connectors
- Hadoop Cluster Connectors
- Logging Connectors
- Push Content Connectors
- Repository Connectors
- Script Connectors
- Social Media Connectors
- Web Connectors
- WebSphere Connector
Content stored in a database comes in the form of a set of rows returned in response to a query. Content is fetched via a series of database queries. The rows returned are transformed into documents for the index. The query results set returned by a fixed query will change over time. Database rows lack the kind of metadata information available on files, so tracking changes to documents is not always possible.
Fusion provides the following database connectors:
A filesystem-based data store is a network of nodes to be traversed, where each node (e.g., a Unix file directory) provides information about its child nodes (e.g., the files in that directory) or references other nodes (e.g. links in an html document). The extent of the network of nodes to be traversed is discovered during the crawl. The crawler captures information about the node, e.g., file name, permissions, date of creation, last modification, and last access, as well as the contents of the nodes.
Fusion provides the following filesystem connectors:
HDFS This crawler doesn’t use MapReduce for document processing, but instead treats HDFS as a simple filesystem.
S3 Hadoop FS (Hadoop over Amazon)
SolrXML - Specific to files in SolrXML format only; it cannot be used with any other kind of XML.
Hadoop Cluster Connectors
Fusion provides connectors for the following Hadoop distributions:
See section Hadoop Connector and Datasource Configuration for connector and datasource details.
Push Content Connectors
Push connector- push content to Fusion for indexing into Solr via a Fusion index pipeline
Social Media Connectors
Websphere - The Fusion WebSphere connector requires Solr. Configuration and integration is more complicated than the other Fusion connectors.