- Fusion Components
Basic Fusion concepts are explained below. Since the core of Fusion Server is Solr, you may also find it useful to familiarize yourself with Solr terminology.
Fusion apps provide tailored search functionality to specific groups of users.
An app is a named set of linked objects, including collections, datasources, index and query pipelines, index and query profiles, parsers, and more. Using roles and security realms, you can define security on a per-app basis.
See App Management.
Collections consist of stored data and the datasources that determine how the data is ingested and indexed. Collections are a way to logically group your data sets. Fusion’s concept of collections is the same as Solr collections. See Collection Management.
Datasources are the configurations that determine how data is ingested and indexed. Each datasource includes a connector configuration, a parser configuration, and an index pipeline configuration. See Datasource Configuration.
Connectors are the conduit between Fusion and your external data sources. Connectors retrieve your data and import it into Fusion Server. See the Connectors Reference Guide for a complete list of available connectors.
Parsers interpret incoming data in order to determine its format and fields. A parser consists of a sequence of parsing stages, each designed to parse a different data format, sometimes recursively. See the Parser Stages Reference Guide for complete details about all available parsing stages.
Index pipelines format the incoming raw data data into fielded documents that it can be indexed and searched by the Solr core. A pipeline consists of a sequence of stages, and each stage performs a different kind of processing based on user-configured logic. See the Index Pipeline Stages Reference Guide for a complete list of available index pipeline stages.
Query pipelines manipulate incoming queries and return an ordered list of matching results from Solr. Individual search results are called documents. See Query Pipeline Configuration.
Solr is the search platform that powers Fusion. There are multiple aspects to Fusion’s use of Solr:
Fusion components manage Solr search and indexing and provide analytics over these collections. Fusion’s analytics components depend on aggregations over information which is stored in a Solr collection.
Fusion collections are all Solr collections.
Application data is stored as one or more Solr collections.
Fusion’s own logs are stored as Solr collections.
A few Fusion service APIs use Solr as a backing store, notably Parameter Sets.
Fusion requires that Solr run with SolrCloud enabled.
Solr runs in its own Docker container. Changing Solr’s Jetty configuration requires the Docker container to be rebuilt.
Solr log files are available from kubctl logs or in Solr’s Docker container in
Accessing the Solr UI
To access the Solr UI, run
kubectl port-forward pods/default-solr-0 8983:8983 to establish an accessible port. Then, go to
Solr documentation and additional resources are available at http://lucene.apache.org/solr/resources.html.
Apache Spark is a fast and general execution engine for large-scale data processing jobs that can be decomposed into stepwise tasks which are distributed across a cluster of networked computers. Spark provides faster processing and better fault-tolerance than previous MapReduce implementations.
See Spark Administration for more information.
Apache ZooKeeper is a distributed configuration service, synchronization service, and naming registry.
Fusion uses ZooKeeper to configure and manage all Fusion components in a single Fusion deployment, therefore a ZooKeeper service must always be running as part of the Fusion deployment. For high availability, this should be an external 3-node ZooKeeper cluster. All Fusion Java components communicate with ZooKeeper using the ZooKeeper API.
For ZooKeeper installation instructions, see the ZooKeeper documentation.
znode: ZooKeeper data is organized into a hierarchal name space of data nodes called znodes. A znode can have data associated with it as well as child znodes. The data in a znode is stored in a binary format, but it is possible to import, export, and view this information as JSON data. Paths to znodes are always expressed as canonical, absolute, slash-separated paths; there are no relative reference.
ephemeral nodes: An ephemeral node is a znode which exists only for the duration of an active session. When the session ends the znode is deleted. An ephemeral znode cannot have children.
server: A ZooKeeper service consists of one or more machines; each machine is a server which runs in its own JVM and listens on its own set of ports. For testing, you can run several ZooKeeper servers at once on a single workstation by configuring the ports for each server.
quorum: A quorum is a set of ZooKeeper servers. It must be an odd number. For most deployments, only 3 servers are required.
client: A client is any host or process which uses a ZooKeeper service.
See the official ZooKeeper documentation for details about using and managing a ZooKeeper service.
Jetty provides Web services for Fusion’s UI, APIs, and Connectors, plus Solr. Each of those components runs inside its own instance of Jetty, using a separate configuration. Configurations for each component are located in
Securing Fusion using SSL requires configuring Jetty to use SSL. For example, to secure the UI you need to modify the configuration in
See SSL Security (Unix) or SSL Security (Windows).
Log messages are written to stdout and sent to Logstash, which is deployed in our Helm chart. Logstash is configured to index log messages into the
system_logs collection in Fusion.