The Fusion platform is comprised of a set of Java programs, each of which runs in its own JVM. Apache ZooKeeper provides the shared, synchronized data store for all user and application configuration information.
You can adjust the set of Fusion components running on each node to meet processing requirements.
Every Fusion node in a deployment runs the Fusion API Services process. Beyond that, the set of processes running on a particular Fusion node depends on the processing and throughput needs of the search application.
Running Solr on all Fusion nodes scales out document storage as well as providing data replication. (Alternatively, you can use an external SolrCloud cluster to store Fusion collections, see Integrating Fusion with an Existing Solr Deployment.)
Running Fusion Connectors on multiple nodes provides high throughput for indexing and updates, e.g., for applications that run analytics over live data streams such as logfile indexing or mobile tracking devices.
Running the Fusion UI on two or more nodes provides failover for Fusion’s authentication proxy.
Running Apache Spark on multiple nodes provides processing power for applications that aggregate clicks and other signals or use Fusion machine learning components.
This diagram shows the full set of Fusion processes that run on a single node and the default ports used by each, with arrows representing the flow of HTTP requests between components for document search and indexing:
The inputs to this diagram represent:
Users working directly in the Fusion UI, whether for developing and refining search applications, viewing analytics dashboards, or performing system administration tasks. Fusion’s UI component relays all requests to the API Services component.
Search queries, which originate from the search application, are sent to the Fusion UI for authentication. The Fusion UI sends the requests to the Fusion API Services component, which invokes a query pipeline to build out the raw query and send the resulting query to Solr.
Fusion datasources ingest data that will be indexed into a Solr collection. A datasource sends this raw data to Fusion’s connector services. A connector invokes an index pipeline to extract, transform, and otherwise enrich the raw data, and then sends the resulting document to Solr for indexing.
Apache Spark carries out signal processing and aggregations. The Apache Spark master distributes tasks across one or more worker processes.
Apache ZooKeeper is included in this diagram because all Fusion processes across all nodes in a Fusion deployment communicate with the ZooKeeper cluster (also called an ensemble) at the socket layer via ZooKeeper’s Java API.
The Fusion Agent process is the server process that starts, stops, and monitors all Fusion components running on the node.
See these topics for details about each component: