Skip to main content
Lucidworks provides several Grafana dashboards designed for Fusion. For more information, see Configure Grafana, Prometheus, Promtail, and Loki in Fusion.
Before you perform these installation instructions, you must delete any existing persistent volume claims (PVCs) related to Prometheus, Grafana, Promtail, and Loki on your namespace.

Clone the fusion-cloud-native repository

Open a terminal window and run following command:
git clone https://github.com/lucidworks/fusion-cloud-native.git

Install Grafana

  1. In your local fusion-cloud-native repository, run the following command for your <cluster> and <namespace>:
    ./install-prom.sh -c <cluster> -n <namespace>
    
    The following is a sample output. The errors are related to resource limits on the sample cluster, and can be ignored. Similar errors may display for your cluster, and do not impact Grafana logging.
    Adding the stable chart repo to helm repo list
    "prometheus-community" already exists with the same configuration, skipping
    "grafana" already exists with the same configuration, skipping
    
    Installing Prometheus and Grafana for monitoring Fusion metrics ... this can take a few minutes.
    
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "ckotzbauer" chart repository
    ...Successfully got an update from the "lucidworks" chart repository
    ...Successfully got an update from the "grafana" chart repository
    ...Successfully got an update from the "prometheus-community" chart repository
    Update Complete. ⎈Happy Helming!⎈
    Saving 2 charts
    Downloading prometheus from repo https://prometheus-community.github.io/helm-charts
    Downloading grafana from repo https://grafana.github.io/helm-charts
    Deleting outdated charts
    Release "fe-foundry-monitoring" does not exist. Installing it now.
    Error: context deadline exceeded
    
    
    Successfully installed Prometheus and Grafana into the fe-foundry namespace.
    
    NAME                 	NAMESPACE 	REVISION	UPDATED                             	STATUS  	CHART                  	APP VERSION
    fe-foundry           	fe-foundry	11      	2023-08-07 15:10:55.373825 -0700 PDT	deployed	fusion-5.8.0           	5.8.0
    fe-foundry-jupyter   	fe-foundry	2       	2023-07-20 11:29:38.481329 -0700 PDT	deployed	fusion-jupyter-0.2.5   	1.0
    fe-foundry-monitoring	fe-foundry	1       	2023-08-10 11:41:06.113257 -0700 PDT	failed  	fusion-monitoring-1.0.1	1.0.1
    
  2. Using the Grafana service endpoint in the newly-installed Grafana helm release, run the following command:
    kubectl get services
    
    The following is a sample output.
    NAME          TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
    grafana       LoadBalancer   <IP Address>   <IP Address>  3000:32589/TCP    87m
    
    If the output does not display, run the following command to expose Grafana, including an EXTERNAL_IP for your Grafana LoadBalancer service:
    kubectl expose deployment <grafana-deployment-name> --type=LoadBalancer --name=grafana --port=3000 --target-port=3000
    

Install Loki

To obtain Loki from the helm chart repository, run the following command for the unique <loki-release-name> for your cluster:
helm upgrade --install <loki-release-name> --namespace=<namespace> grafana/loki-stack
If you do no enter the <loki-release-name> correctly, an error similar to the following displays:
Error: rendered manifests contain a resource that already exists. Unable to continue with install: PodSecurityPolicy "loki" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "fe-foundry": current value is "ps-intl".
If the helm upgrade is successful, the following is a sample output.
Release "fe-foundry-loki" does not exist. Installing it now.
W0810 11:47:07.890370   39624 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0810 11:47:09.396246   39624 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: fe-foundry-loki
LAST DEPLOYED: Thu Aug 10 11:47:07 2023
NAMESPACE: fe-foundry
STATUS: deployed
REVISION: 1
NOTES:
The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.

See http://docs.grafana.org/features/datasources/loki/ for more detail.

Obtain Admin credentials for Grafana

  1. After you validate Grafana is running by accessing <EXTERNAL-IP>:3000, run the following command to obtain an <admin_password> for your Grafana instance:
    kubectl get secret --namespace <namespace> <release_name>-monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
    
  2. Sign in to Grafana and change the password for security purposes.
  3. Run the following command to display the promtail pods that are running:
    kubectl get pods | grep -i promtail | nl
    
    Promtail pods must match the number of Kubernetes nodes since an instance of Promtail runs on each node.

Add the Loki datsource

  1. Sign in to Grafana and in the toolbar, click the arrow below Home to display all of the options.
  2. In the Configuration section, click Data sources.
  3. Click Add new data source.
  4. In the search bar for the data source, enter Loki.
  5. In the URL field on the Settings screen, enter your unique <loki-release-name:port>. The default port for Loki is 3100.
    If you encounter issues with the <loki-release-name:port> information, open a terminal and run kubectl get services | grep loki to display a list of every service with a name that contains loki along with its associated IP address and port.
  6. Complete the other fields and click Save & test.

Dashboards

Pipeline and Stage Metrics

Both the Indexing and Query service provide per-pipeline, per-stage execution times for the 99th, 95th, 75th, and 50th percentiles. These are commonly used to determine where there may be a performance bottleneck when pipelines are executing slowly.

Indexing Service Dashboard

The Indexing Service has its own Grafana dashboard, which highlights key performance metrics for understanding the behavior of the Fusion 5 Indexing pipelines. It provides per-pipeline and per-stage execution time metrics, along with metrics for document indexing.

Query Service Dashboard

The Query Service also has its own Grafana dashboard that shows the performance of the pipelines and query stages that are being run. It shows the global and per-pod request metrics, in addition to per-pipeline query stage execution times.

Gateway Metrics Dashboard

The Gateway Metrics provides visibility into the behavior of the API Gateway, which stands in front of Fusion 5 services. It includes global request rate gauge, in addition to per-handler and per-route request metrics. This dashboard can be used for locating service slowdowns, as it can highlight particular routes or handlers that are responding slowly.

JVM Metrics Dashboard

The JVM Metrics Dashboard provides information about the CPU and memory utilization of Fusion 5 services, along with some GC metrics. These can be used to assess the JVM performance of Fusion 5 microservices and may help locate bottlenecks, such as being CPU-pegged or hitting frequent garbage collection events that pause the application. Issues relating to garbage collection can often be solved by modifying the JVM options of the process and can then be monitored with this dashboard.

Solr Dashboards

Solr has its metrics split into three distinct dashboards, each covering a separate area of concern for monitoring Solr performance.
Core Dashboard
The Solr Core dashboard contains metrics by the associated Solr core, including per-core metrics for requests, file system utilization, document processing and access, and handler usage.
Node Dashboard
The Solr Node dashboard contains per-node metrics including information about requests, errors, cores, thread pool utilization, and connections. Per-node metrics are useful for tracking down issues with a particular Solr node, especially when these issues do not affect all nodes.
System Dashboard
The Solr System Dashboard shows high-level system stats, including requests/response statistics, JVM metrics, and common operating system metrics, such as memory usage, file descriptor usage, and cpu utilization.

Kubernetes Dashboards

The set of Kubernetes performance dashboards requires the installation of additional daemonsets to collect performance metrics from Kube. Ensure that at least one namespace in your cluster uses the following configuration in the Prometheus values.yaml file for the stable/prometheus Helm chart.
kubeStateMetrics:
    enabled: true
nodeExporter:
    enabled: true
Additionally you will want to install cadvisor (likely into an infrastructure/system namespace):
helm repo add code-chris https://code-chris.github.io/helm-charts
helm repo update
helm install cadvisor code-chris/cadvisor

Kube Metrics Dashboard (replaced by Kube Node Dashboard)

There are Kubernetes metrics that can be queried through Grafana. However, these require installing a daemonset when setting up Prometheus, which displays information about running pods, memory allocation, and CRON jobs.

Kube Node Dashboard

There are Kubernetes metrics that can be queried through Grafana. However, these require installing a daemonset when setting up Prometheus, which displays information about running pods, load, network traffic, memory allocation and more. Some dashboards may not be rendered until you drill down into one or more specific nodes/instances at the top of the dashboard.

Kubernetes Persistence Volumes Dashboard

This dashboard shows you all of your Kubernetes PVCs, filterable by namespace and/or node. It provides you with information about disk utilization so you can monitor for PVCs that are nearing capacity.

Docker Host and Container Overview Dashboard

This dashboard provides you a great view of your running containers, filterable by namespace, node, and/or container name. It provides information about number of running containers, network traffic, memory utilization, disk IO, and load metrics.

Downloads

I