Apache ZooKeeper is a distributed configuration service, synchronization service, and naming registry.
Fusion uses ZooKeeper to configure and manage all Fusion components in a single Fusion deployment, therefore a ZooKeeper service must always be running as part of the Fusion deployment. For high availability, this should be an external 3-node ZooKeeper cluster. All Fusion Java components communicate with ZooKeeper using the ZooKeeper API.
For ZooKeeper installation instructions, see the ZooKeeper documentation.
You can find ZooKeeper’s logs at
znode: ZooKeeper data is organized into a hierarchal name space of data nodes called znodes. A znode can have data associated with it as well as child znodes. The data in a znode is stored in a binary format, but it is possible to import, export, and view this information as JSON data. Paths to znodes are always expressed as canonical, absolute, slash-separated paths; there are no relative reference.
ephemeral nodes: An ephemeral node is a znode which exists only for the duration of an active session. When the session ends the znode is deleted. An ephemeral znode cannot have children.
server: A ZooKeeper service consists of one or more machines; each machine is a server which runs in its own JVM and listens on its own set of ports. For testing, you can run several ZooKeeper servers at once on a single workstation by configuring the ports for each server.
quorum: A quorum is a set of ZooKeeper servers. It must be an odd number. For most deployments, only 3 servers are required.
client: A client is any host or process which uses a ZooKeeper service.
See the official ZooKeeper documentation for details about using and managing a ZooKeeper service.
Fusion ZooKeeper Nodes
Fusion configuration data is stored in ZooKeeper under two znodes:
lucidstores all application-specific configurations, including collection, datasource, pipeline, signals, aggregations, and associated scheduling, jobs, and metrics.
lucid-apollo-adminstores all access control information, including all users, groups, roles, and realms.
The Solr Admin tool provides a ZooKeeper node browser tool. In the case of the Fusion default developer deployment, the Fusion runs scripts are configured to run the instances of both Solr and ZooKeeper which are included with the Fusion distribution, and therefore we take a fresh install of a Fusion developer instance and use the embedded Solr’s Admin tool to explore how Fusion’s configurations are managed in ZooKeeper.
On initial install, the "lucid" znode contains the set of default configurations used by Fusion’s services:
The "lucid-apollo-admin" znode contains the set of nodes used by Fusion’s access control services:
In the above screenshot, the ZooKeeper node browser is browsing the contents of znode "lucid-apollo-admin/users" which is empty. The Fusion distribution ships without any user accounts. The initial user added to Fusion is the Fusion native realm "admin" user. This entry is only created on initial startup via the Fusion UI "set admin password" panel. Once you submit the admin password, the admin user account is created. Until Fusion contains as least the admin user account, you cannot use the system, because all Fusion requests require proper authorization.
Once the admin password is set, and you have created one or more Fusion collections and have populated them by running one or more datasources, these collections, datasources, pipelines, and other application configuration settings are stored under the "lucid" znode:
In the above screenshot, the ZooKeeper node browser is browsing the contents of znode "lucid/connectors/datasources/ds1". This datasource was used to populate a Fusion collection with documents retrieved via a webcrawl. Note that in the initial screenshot for znode "lucid", there is no "connectors" node at all.
The "lucid-apollo-admin" znode now contains one user accounts for user "admin":