High CPU load
Heavy CPU load across all nodes in the cluster might indicate the need for more nodes. More typically, however, a single node or service is being overloaded.
topor another monitoring utility to find out which processes are using high CPU.
All Fusion services, including Solr and Zookeeper, run as Java processes. Most services indicate the name of the service via the
Common reasons for busy services:
Connectors-classic: document parsing and ingestion
Connectors-classic: document-heavy index and/or query pipeline use
API: document-heavy index and/or query pipeline use
Solr: use across multiple nodes holding the same collection
Solr: use across multiple nodes holding shard leaders
Heavy query load involving facets, stats, and/or sorting
Heavy document writing to a given collection
What to do:
Find out which services or nodes are being overloaded.
Find out if there is known activity to explain the excessive CPU load.
If no explanation is found, Contact Lucidworks Support.
Low memory problems are typically the result of two issues: low system memory and a lack of heap space for an individual service.
Lack of free memory
You can detect a lack of free memory on a server using the
free -h command. On servers running the Solr service, large amounts of free memory is ideal because this memory is used for filesystem caching. Take note of the amount of free memory and the disk cache in comparison to the overall index size. Ideally, the disk cache should be able to hold all of the index, although this is not as important when using newer SSD technologies.
There are several ways to increase the available memory on a server:
Change the command-line options configured in
conf/fusion.corsto reduce the heap sizes of individual services. However, this might result in a lack of heap space, as described below.
Run fewer services on a node and/or reallocate services that need more memory to nodes with extra capacity.
Add nodes to the cluster.
Add memory to the nodes.
Lack of heap space
Finding processes that have exited with
OutOfMemoryError in the logs, or finding a dump file called
java_pidXXX.hprof in the log directory, indicates that a service failed due to lack of heap space.
Heap space is configured on a per-service basis in the
conf/fusion.cors file via the
-Xms command-line parameters. Avoid allocating heap sizes that are known to be larger than needed for a service, because these can lead to long GC pauses.
Two common ways to detect long GC pauses include:
gc_*.logfiles in the log directories. For deep analysis, upload files to http://gceasy.io/.
top, look for periods when all cores are busy followed by a spike in one core (with most or all other cores dropping to near zero). This is pattern is typically found when one service is busy and is encountering long GC pauses.
|The need for GC analysis varies from application to application.|