Configure Grafana, Prometheus, Promtail, and Loki in Fusion
Configure Grafana, Prometheus, Promtail, and Loki in Fusion
Before you perform these installation instructions, you must delete any existing persistent volume claims (PVCs) related to Prometheus, Grafana, Promtail, and Loki on your namespace.
Clone the fusion-cloud-native
repository
Open a terminal window and run following command:Install Grafana
-
In your local
fusion-cloud-native
repository, run the following command for your<cluster>
and<namespace>
:The following is a sample output. The errors are related to resource limits on the sample cluster, and can be ignored. Similar errors may display for your cluster, and do not impact Grafana logging. -
Using the Grafana service endpoint in the newly-installed Grafana helm release, run the following command:
The following is a sample output.If the output does not display, run the following command to expose Grafana, including an
EXTERNAL_IP
for your Grafana LoadBalancer service:
Install Loki
To obtain Loki from the helm chart repository, run the following command for the unique<loki-release-name>
for your cluster:<loki-release-name>
correctly, an error similar to the following displays:Obtain Admin credentials for Grafana
-
After you validate Grafana is running by accessing
<EXTERNAL-IP>:3000
, run the following command to obtain an<admin_password>
for your Grafana instance: - Sign in to Grafana and change the password for security purposes.
-
Run the following command to display the promtail pods that are running:
Promtail pods must match the number of Kubernetes nodes since an instance of Promtail runs on each node.
Add the Loki datsource
- Sign in to Grafana and in the toolbar, click the arrow below Home to display all of the options.
- In the Configuration section, click Data sources.
- Click Add new data source.
- In the search bar for the data source, enter Loki.
-
In the URL field on the Settings screen, enter your unique
<loki-release-name:port>
. The default port for Loki is3100
.If you encounter issues with the<loki-release-name:port>
information, open a terminal and runkubectl get services | grep loki
to display a list of every service with a name that containsloki
along with its associated IP address and port. - Complete the other fields and click Save & test.
Dashboards
Pipeline and Stage Metrics
Both the Indexing and Query service provide per-pipeline, per-stage execution times for the 99th, 95th, 75th, and 50th percentiles. These are commonly used to determine where there may be a performance bottleneck when pipelines are executing slowly.Indexing Service Dashboard
The Indexing Service has its own Grafana dashboard, which highlights key performance metrics for understanding the behavior of the Fusion 5 Indexing pipelines. It provides per-pipeline and per-stage execution time metrics, along with metrics for document indexing.Query Service Dashboard
The Query Service also has its own Grafana dashboard that shows the performance of the pipelines and query stages that are being run. It shows the global and per-pod request metrics, in addition to per-pipeline query stage execution times.Gateway Metrics Dashboard
The Gateway Metrics provides visibility into the behavior of the API Gateway, which stands in front of Fusion 5 services. It includes global request rate gauge, in addition to per-handler and per-route request metrics. This dashboard can be used for locating service slowdowns, as it can highlight particular routes or handlers that are responding slowly.JVM Metrics Dashboard
The JVM Metrics Dashboard provides information about the CPU and memory utilization of Fusion 5 services, along with some GC metrics. These can be used to assess the JVM performance of Fusion 5 microservices and may help locate bottlenecks, such as being CPU-pegged or hitting frequent garbage collection events that pause the application. Issues relating to garbage collection can often be solved by modifying the JVM options of the process and can then be monitored with this dashboard.Solr Dashboards
Solr has its metrics split into three distinct dashboards, each covering a separate area of concern for monitoring Solr performance.Core Dashboard
The Solr Core dashboard contains metrics by the associated Solr core, including per-core metrics for requests, file system utilization, document processing and access, and handler usage.Node Dashboard
The Solr Node dashboard contains per-node metrics including information about requests, errors, cores, thread pool utilization, and connections. Per-node metrics are useful for tracking down issues with a particular Solr node, especially when these issues do not affect all nodes.System Dashboard
The Solr System Dashboard shows high-level system stats, including requests/response statistics, JVM metrics, and common operating system metrics, such as memory usage, file descriptor usage, and cpu utilization.Kubernetes Dashboards
The set of Kubernetes performance dashboards requires the installation of additional daemonsets to collect performance metrics from Kube. Ensure that at least one namespace in your cluster uses the following configuration in the Prometheusvalues.yaml
file for the stable/prometheus
Helm chart.