Troubleshoot When Installing Fusion 4.x
This topic explains how to troubleshoot difficulties that occur when installing or upgrading Fusion.
Fusion run script failures
Common problems that cause Fusion run scripts to fail:
-
Wrong Java version.
-
Spaces in Windows install path.
-
Users have insufficient privileges for the installation directory.
-
Java
bin
directory not in thePATH
environment variable. -
Some Fusion services may already be running, or registered as running.
-
Roaming IP address; try uncommenting this line in
https://FUSION_HOST:FUSION_PORT/conf/fusion.cors
(fusion.properties
in Fusion 4.x):default.address = 127.0.0.1
Check the Java version
Fusion runs on JDK 1.8. See System Requirements.
Fusion scripts use the environment variable JAVA_HOME
.
To check the setting of this variable, log in to the account used to run Fusion, and check that this variable is set to the proper value.
On a linux, Mac, or other Unix system, use the following command:
echo $JAVA_HOME
On Windows, the command is:
echo %JAVA_HOME%
Fusion scripts execute both the java
and javac
commands.
To check the Java version invoked by these commands, run the following commands from a shell or terminal window:
java -version javac -version
Clear browser cache
If a previous version of Fusion was accessed in the browser with the same URL as that of the newly installed version of Fusion, then there may be old pages and/or cookies in the browser cache. A hard page refresh will clear old pages from the browser cache. If clearing the page cache does not solve this problem, clear session cookies as well.
Stop/Clean up/Start
If the script https://FUSION_HOST:FUSION_PORT/bin/fusion start
completes without reporting an error, but the
Fusion UI displays a message that it cannot find Collections or Datasources,
this may be due to Fusion services not being able to communicate properly (via ZooKeeper).
This can happen with developer deployments running on a laptop if the network connection
changes or is interrupted, especially when using the embedded ZooKeeper instance
that is bundled with Fusion.
In this situation, you should stop Fusion, inspect the system processes
and if necessary, manually terminate running processes and cleanup .pid
files
to bring the system back to a clean state, then start Fusion once again.
Although the Fusion run script bin/fusion
provides a restart option,
the restart option assumes a correctly functioning system and cannot
always recover from system failure.
To stop Fusion:
Run the script https://FUSION_HOST:FUSION_PORT/bin/fusion
with the argument stop
:
$ cd {path_to}https://FUSION_HOST:FUSION_PORT
$ ./bin/fusion stop
Successfully stopped ui (process ID 41524)
Successfully stopped connectors (process ID 41328)
Successfully stopped api (process ID 41159)
Successfully stopped solr (process ID 41153)
Successfully stopped zookeeper (process ID 41151)
After stopping Fusion, you should make sure that no Fusion services are running.
When the Fusion scripts start a Fusion service, they record the process id in a .pid
file in the directory https://FUSION_HOST:FUSION_PORT/var
.
For a Fusion instance that is up and running, we see the following set of .pid
files:
> find {path_to}https://FUSION_HOST:FUSION_PORT/var -name "*.pid" -print
fusion/var/api/api.pid
fusion/var/connectors/connectors.pid
fusion/var/solr/solr.pid
fusion/var/spark-master/spark-master.pid
fusion/var/spark-worker/spark-worker.pid
fusion/var/ui/ui.pid
fusion/var/zookeeper/zookeeper.pid
The above output shows the set of .pid
created by a single Fusion instance running with embedded ZooKeeper and Solr.
But if no Fusion services are running, there should not be any .pid
files.
In the case that all services have been stopped, but there are still some .pid
files found,
these files should be deleted before starting Fusion.
Inspect the log files
If none of the above help, inspect the Fusion log files in directory
https://FUSION_HOST:FUSION_PORT/var/log
.
If you experience unexpected termination when running Fusion, first look in the log files for clues.
One setting you can look into is in $FUSION_HOME/conf/fusion.cors: default.supervision.pollingFailureCountThreshold
.
By default, pollingFailureCountThreshold
is set to 1
, so the Agent restarts all services the second time it fails to reach a service. Try a modest increase, for example set pollingFailureCountThreshold
to 3
.
Log file names that start with "oom" indicate out-of-memory problems.
You might need to increase the amount of memory allotted to that service.
The amount of memory allotted to each kind of Fusion service is controlled by
environment variables that are set in the
fusion.cors
(fusion.properties
in Fusion 4.x) file.
Troubleshoot a Windows Install
Check common Windows service install script mistakes:
-
Is the account trying run the install script a poweruser/administrator of the server?
-
Is the DOMAIN\USERNAME correctly specified? Is the Domain correct?
-
Is
java
installed on the%PATH%
? To use a different Java, specify it inbin/windows-service-wrapper.xml
. -
Are there any obvious issues in
var/log/windows-service-wrapper.log
? -
Does Fusion start from the normal
bin\fusion start
? -
Are there any other errors in the normal logs?
Increase memory
One other thing that can happen if you have not changed any of the default settings is for the services to run out of memory under heavy load, causing the program to crash.
To find out if this happened, you can check for the presence of any files matching the pattern oom_killer-.log* in the log directory for the service that is being restarted, for example, $FUSION_HOME/var/log/connectors-classic if that is the one being restarted.
If this is your issue, the first step is to increase the memory of the affected component by modifying conf/fusion.cors
(fusion.properties
in Fusion 4.x). Go to the jvmOptions
for the service in question and change the value of the -Xmx
flag. By default you will see something like:
connectors-classic.jvmOptions = -Xmx1g -Xss256k -Dcom.lucidworks.connectors.pipelines.embedded=false
-Xmx1g
means this service will fail if it needs more than 1 gigabyte of memory to operate. Increase this memory, for example set the flag to Xmx4g
for an additional 3 gigabytes. 1g can be very low for some workloads.