> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Fusion 5 Survival Guide

export const InlineImage = ({src, alt = '', height = '2em'}) => {
  return <img src={src} alt={alt} style={{
    display: 'inline',
    verticalAlign: 'start',
    height: height,
    margin: '0'
  }} />;
};

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/5/fusion/operations/survival-guide/overview

[mintlify link]: https://doc.lucidworks.com/docs/5/fusion/operations/survival-guide/overview

[old doc.lw link]: https://doc.lucidworks.com/fusion/5.9/119

The purpose of this guide is to help you install, configure, and run Fusion 5 in production with high availability on Kubernetes. You can also find this guide in the [fusion-cloud-native repo](https://github.com/lucidworks/fusion-cloud-native/).

<Note>
  The Fusion 5 Survival Guide prefers Google Kubernetes Engine (GKE) terminology. Although your Kubernetes provider’s terminology may differ, the general concepts are consistent between providers.
</Note>

<LwTemplate />

## Foundational concepts

* [Which Kubernetes?](/docs/5/fusion/operations/survival-guide/platforms)
* [Multiple Zones for High Availability](/docs/5/fusion/operations/survival-guide/high-availability)
* [Fusion Microservices](/docs/5/fusion/reference/microservices)
* [Ingress and Security](/docs/5/fusion/operations/survival-guide/ingress)
* [Stateless Sessions with JWT](/docs/5/fusion/operations/survival-guide/jwt)
* [Workload Isolation with Multiple Node Pools](/docs/5/fusion/operations/survival-guide/workload-isolation)
* [High-Performance Query Processing with Auto-Scaling](/docs/5/fusion/operations/survival-guide/auto-scaling)
* [Fusion 5 Frequently Asked Questions](/docs/5/fusion/operations/faq)

## Planning your deployment

* **Deploy Fusion at Scale**

<Accordion title="Deploy Fusion at Scale">
  Before you begin, see [Fusion Server Deployment](/docs/5/fusion/operations/deployment) to understand the architecture and requirements.

  This article explains how to plan and execute a Fusion deployment at the scale required for staging or production.

  While the `setup_f5_*.sh` scripts are handy for getting started and proof-of-concept purposes, this article covers the planning process for building a production-ready environment.

  <Card title="Preparing for Fusion Implementation" class="note-image" href="https://academy.lucidworks.com/preparing-for-fusion-implementation" cta="Take this course on the LucidAcademy." icon="graduation-cap" iconType="duotone">
    The course for **Preparing for Fusion Implementation** focuses on the key elements for a successful implementation, defining your business requirements, preparing clean data, and involving the right personnel.
  </Card>

  ## Prerequisites

  You must meet the following prerequisites before you can customize your Fusion cluster:

  * A local copy of the [fusion-cloud-native repository](https://github.com/lucidworks/fusion-cloud-native). This must be up-to-date with the latest master branch.
  * Any cloud provider-specific **command line tools**, such as `gcloud` or `aws`, and `kubectl`.\
    See the platform-specific instructions linked above, or check with your cloud provider.
  * Helm v3
    * To install on a Mac:
    ```bash theme={"dark"}
    brew upgrade kubernetes-helm
    ```
    * For other operating systems, download from [Helm Releases](https://github.com/helm/helm/releases).
    * Verify your installation:
    ```bash theme={"dark"}
    helm version --short
    v3.0.0+ge29ce2a
    ```
  * Kubernetes namespace
    * Collect the following information about your Kubernetes environment:
      * **CLUSTER**: Cluster name (passed to our setup scripts using the `-c` arg)
      * **NAMESPACE**: Kubernetes namespace where to install Fusion; a namespace should only contain lowercase letters (a-z), digits (0-9), or dash. No periods or underscores allowed.
  * *(optional)* Clarify your organization’s DockerHub policy. The Fusion Helm chart points to public Docker images on DockerHub. Your organization may not allow Kubernetes to pull images directly from DockerHub or may require extra security scanning before loading images into production clusters.\
    Consult your Kubernetes and Docker admin team to find how to get the Fusion images loaded into a registry that’s accessible to your cluster. You can update the image for each service using the [custom values YAML file](#custom-values-yaml-file).

  <Tip>
    **Kubernetes namespace tips**

    * Fusion 5 service discovery requires all services for the same release be deployed in the same namespace. Moreover, you should only run one instance of Fusion in a namespace. If you need multiple instances of Fusion running in the same Kubernetes cluster, then you need to deploy them in separate namespaces.
    * If your organization requires CPU / Memory quotas for namespaces, you can start with a minimum of 12 CPU and 45GB of RAM (such as 3 x n1-standard-4 on GKE), but you will need to increase the quotas once you start load testing Fusion with production workloads and real datasets.
    * Fusion requires at least 3 ZooKeeper nodes and 2 Solr nodes to achieve high availability.
  </Tip>

  ## Custom values YAML file

  1. Clone the `fusion-cloud-native` repository: `git clone https://github.com/lucidworks/fusion-cloud-native`

  2. Run the `customize_fusion_values.sh` script.

     ```bash theme={"dark"}
     ./customize_fusion_values.sh  --provider <provider> -c <cluster> -n <namespace> \
      --num-solr 3 \
      --solr-disk-gb 100 \
      --node-pool <node_selector> \
      --prometheus true \
      --with-resource-limits \
      --with-affinity-rules
     ```

     <Tip>   Pass the `--help` parameter to see script usage details.</Tip>
     The script creates the following files:

     | File                                                      | Description                                                                                                        |
     | --------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
     | `<provider>_<cluster>_<namespace>_fusion_values.yaml`     | Main custom values YAML used to override Helm chart defaults for Fusion microservices.                             |
     | `<provider>_<cluster>_<namespace>_monitoring_values.yaml` | Custom values yaml used to configure Prometheus and Grafana.                                                       |
     | `<provider>_<cluster>_<namespace>_fusion_resources.yaml`  | Resource requests and limits for all Microservices.                                                                |
     | `<provider>_<cluster>_<namespace>_fusion_affinity.yaml`   | Pod affinity rules to ensure multiple replicas for a single service are evenly distributed across zones and nodes. |
     | `<provider>_<cluster>_<namespace>_upgrade_fusion.sh`      | Script used to install and/or upgrade Fusion using the aforementioned custom values YAML files.                    |

     For an explanation of these placeholder values, see [Configuration Values](#custom-values-yaml-file) below.

  3. Add the new files to version control. You will make changes to it over time as you fine-tune your Fusion installation. You will also need it to perform upgrades. If you try to upgrade your Fusion installation and don’t provide the custom values YAML, your deployment will revert to chart defaults.\
     Review the `<provider>_<cluster>_<release>_fusion_values.yaml` file to familiarize yourself with its structure and contents. Notice it contains a separate section for each of the Fusion microservices. The example configuration of the `query-pipeline` service below illustrates some important concepts about the custom values YAML file.

     ```yaml highlight={1,2,3,5,6} theme={"dark"}
     query-pipeline:
     enabled: true
     nodeSelector:
         cloud.google.com/gke-nodepool: default-pool
     javaToolOptions: "..."
     pod:
         annotations:
         prometheus.io/port: "8787"
         prometheus.io/scrape: "true"
         prometheus.io/path: "/actuator/prometheus"
     ```

  4. Service-specific setting overrides under the top-level heading

  5. Every Fusion service has an implicit enabled flag that defaults to true, set to false to remove this service from your cluster

  6. Node selector identifies the label find nodes to schedule pods on

  7. Used to pass JVM options to the service

  8. Pod annotations to allow Prometheus to scrape metrics from the service

  Once we go through all of the configuration topics in this topic, you'll have a well-configured custom values YAML file for your Fusion 5 installation. You'll then use this file during the Helm v3 installation at the end of this topic.

  #### Custom labels and annotations

  Starting with Fusion 5.9.16 and later, you can configure custom labels and annotations for all Kubernetes resources through values files without modifying chart templates. This eliminates upgrade complexity and maintenance overhead.

  Configure global values that apply to all Fusion services:

  ```yaml theme={"dark"}
  global:
    labels:
      owner: search-platform
    annotations:
      cost-center: fin-1234
  ```

  Configure service-specific values that apply only to individual charts:

  ```yaml theme={"dark"}
  fusion-indexing:
    labels:
      team: platform-search
    annotations:
      monitoring: enabled
  ```

  The chart automatically merges and renders these values into Kubernetes resource metadata. Common use cases include cost allocation labels, ownership labels, monitoring annotations, and service mesh annotations.

  ### Deployment-specific values

  The script creates a custom values YAML file using the naming convention: `<provider>_<cluster>_<namespace>_fusion_values.yaml`. For example, `gke_search_f5_fusion_values.yaml`.

  | Parameter         | Description                                                                |
  | ----------------- | -------------------------------------------------------------------------- |
  | `<provider>`      | The K8s platform you’re running on, such as `gke`.                         |
  | `<cluster>`       | The name of your cluster.                                                  |
  | `<namespace>`     | The K8s namespace where you want to install Fusion.                        |
  | `<node_selector>` | Specifies a `nodeSelector` label to find nodes to schedule Fusion pods on. |

  <Warning>
    Providing the correct `--node-pool <node_selector>` label is very important. Using the wrong value will cause your pods to be stuck in the `pending` state. If you’re not sure about the correct value for your cluster, pass ’{}'\` to let Kubernetes decide which nodes to schedule Fusion pods on.
  </Warning>

  Default `nodeSelector` labels are provider-specific. The `fusion-cloud-native` scripts use the following defaults for GKE and EKS:

  | Provider | Default node selector                            |
  | -------- | ------------------------------------------------ |
  | GKE      | cloud.google.com/gke-nodepool: default-pool      |
  | EKS      | alpha.eksctl.io/nodegroup-name: standard-workers |

  <Warning>
    If you are deploying Fusion 5.9.12, add the following to your `values.yaml` file to avoid a known issue that prevents the `kuberay-operator` pod from launching successfully:  `yaml kuberay-operator:   crd:     create: true `
  </Warning>

  ### Flags

  The script provides flags for additional configuration:

  | Flag                     | Description                                       |
  | ------------------------ | ------------------------------------------------- |
  | `--node-pool`            | Add a Fusion specific label to your nodes.        |
  | `--with-resource-limits` | Configure resource requests/limits.               |
  | `--with-replicas`        | Configure replica counts.                         |
  | `--with-affinity-rules`  | Configure pod affinity rules for Fusion services. |

  Use `--node-pool` to add a Fusion specific label to your nodes by doing:

  ```bash theme={"dark"}
  kubectl label <NODE_ID> fusion_node_type=<NODE_LABEL>
  ```

  Then, pass `--node-pool 'fusion_node_type: <NODE_LABEL>'`.

  ## Configure Solr sizing

  When you’re ready to build a production-ready setup for Fusion 5, you need to customize the Fusion Helm chart to ensure Fusion is well-configured for production workloads.

  You’ll be able to scale the number of nodes for Solr up and down after building the cluster, but you need to establish the initial size of the nodes (memory and CPU) and the size and type of disks you need.

  See the example config below to learn which parameters to change in the custom values YAML file.

  ```yaml expandable theme={"dark"}
  solr:
    resources:                    # Set resource limits for Solr to help K8s pod scheduling;
      limits:                     # these limits are not just for the Solr process in the pod,
        cpu: "7700m"              # so allow ample memory for loading index files into the OS cache (mmap)
        memory: "26Gi"
      requests:
        cpu: "7000m"
        memory: "25Gi"
    logLevel: WARN
    nodeSelector:
      fusion_node_type: search    # Run this Solr StatefulSet in the "search" node pool
    exporter:
      enabled: true               # Enable the Solr metrics exporter (for Prometheus) and
                                  # schedule on the default node pool (system partition)
      podAnnotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9983"
        prometheus.io/path: "/metrics"
      nodeSelector:
        cloud.google.com/gke-nodepool: default-pool
    image:
      tag: 8.4.1
    updateStrategy:
      type: "RollingUpdate"
    javaMem: "-Xmx3g -Dfusion_node_type=system" # Configure memory settings for Solr
    solrGcTune: "-XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+UseStringDeduplication -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=150 -XX:+UseLargePages -XX:+AlwaysPreTouch"
    volumeClaimTemplates:
      storageSize: "100Gi"        # Size of the Solr disk
    replicaCount: 6               # Number of Solr pods to run in this StatefulSet

  zookeeper:
    nodeSelector:
      cloud.google.com/gke-nodepool: default-pool
    replicaCount: 3               # Number of Zookeepers
    persistence:
      size: 20Gi
    resources: {}
    env:
      ZK_HEAP_SIZE: 1G
      ZOO_AUTOPURGE_PURGEINTERVAL: 1
  ```

  To be clear, you can tune GC settings and number of replicas after the cluster is built. But changing the size of the persistent volumes is more complicated so you should try to pick a good size initially.

  ### Configure storage class for Solr pods (optional)

  If you wish to run with a storage class other than the default you can create a storage class for your Solr pods before you install. For example, to create regional disks in GCP you can create a file called `storageClass.yaml` with the following contents:

  ```yaml theme={"dark"}
  kind: StorageClass
  apiVersion: storage.k8s.io/v1
  metadata:
   name: solr-gke-storage-regional
  provisioner: kubernetes.io/gce-pd
  parameters:
   type: pd-standard
   replication-type: regional-pd
   zones: us-west1-b, us-west1-c
  ```

  and then provision into your cluster by calling:

  ```bash theme={"dark"}
  kubectl apply -f storageClass.yaml
  ```

  to then have Solr use the storage class by adding the following to the custom values YAML:

  ```yaml theme={"dark"}
  solr:
    volumeClaimTemplates:
      storageClassName: solr-gke-storage-regional
      storageSize: 250Gi
  ```

  <Note>
    We’re not advocating that you must use regional disks for Solr storage, as that would be redundant with Solr replication. We’re just using this as an example of how to configure a custom storage class for Solr disks if you see the need. For instance, you could use regional disks without Solr replication for write-heavy type collections.
  </Note>

  ## Configure multiple node pools

  Lucidworks recommends isolating search workloads from analytics workloads using multiple node pools. The included scripts do not do this for you; this is a manual process.

  See the example script for GKE, see [create\_gke\_cluster\_node\_pools.sh](https://github.com/lucidworks/fusion-cloud-native/blob/master/additional_environments/create_gke_cluster_node_pools.sh).

  In the custom values YAML file, you can add additional Solr StatefulSets by adding their names to the list under the `nodePools` property. If any property for that statefulset needs to be changed from the default set of values, then it can be set directly on the object representing the node pool, any properties that are omitted are defaulted to the base value. See the following example (additional whitespace added for display purposes only):

  ```yaml expandable highlight={3,4,9,17,22,31} theme={"dark"}
  solr:
    nodePools:
      - name: ""
      - name: "analytics"
        javaMem: "-Xmx6g"
        replicaCount: 6
        storageSize: "100Gi"
        nodeSelector:
          fusion_node_type: analytics
        resources:
          requests:
            cpu: 2
            memory: 12Gi
          limits:
            cpu: 3
            memory: 12Gi
      - name: "search"
        javaMem: "-Xms11g -Xmx11g"
        replicaCount: 12
        storageSize: "50Gi"
        nodeSelector:
          fusion_node_type: search
        resources:
          limits:
            cpu: "7700m"
            memory: "26Gi"
          requests:
            cpu: "7000m"
            memory: "25Gi"
    nodeSelector:
      cloud.google.com/gke-nodepool: default-pool
  ...
  ```

  1. The empty string `""` is the suffix for the default partition.
  2. Overrides the settings for the **analytics** Solr pods.
  3. Assigns the **analytics** Solr pods to the node pool and attaches the label `fusion_node_type=analytics`. You can use the `fusion_node_type` property in Solr auto-scaling policies to govern replica placement during collection creation.
  4. Overrides the settings for the **search** Solr pods.
  5. Assigns the **search** Solr pods to the node pool and attaches the label `fusion_node_type=search`.
  6. Sets the default settings for all Solr pods, if not specifically overridden in the `nodePools` section above.

  <Warning>
    Do not edit the `nodePools` value `""`.
  </Warning>

  In the example above, the **analytics** partition `replicaCount`, or number of Solr pods, is six. The **search** partition `replicaCount` is twelve.

  Each nodePool is automatically be assigned the -Dfusion\_node\_type property of `<search>`, `<system>`, or `<analytics>`. This value matches the name of the nodePool. For example, `-Dfusion_node_type=<search>`.

  The Solr pods have a `fusion_node_type` system property, as shown below:

  <img src="https://mintcdn.com/lucidworks/L5PMnIeZ03zhv8Ti/assets/images/5.4/survival-guide/fusion_node_type.png?fit=max&auto=format&n=L5PMnIeZ03zhv8Ti&q=85&s=62bb1c560c49c3f6ac1d29b46dc796f3" alt="fusion_node_type system property" width="2483" height="936" data-path="assets/images/5.4/survival-guide/fusion_node_type.png" />

  ## Solr auto-scaling policy

  Use [replica placement plugins](https://solr.apache.org/guide/solr/latest/configuration-guide/replica-placement-plugins.html) to control how replicas are placed in Solr.

  ## Pod network policy

  A Kubernetes network policy governs how groups of pods communicate with each other and other network endpoints. With Fusion, all incoming traffic flows through the API Gateway service. All Fusion services in the same namespace expect an internal JWT, which is supplied by the Gateway, as part of the request. As a result, Fusion services enforce a basic level of API security and don’t need an additional network policy to protect them from other pods in the cluster.

  To install the network policy for Fusion services, pass `--set global.networkPolicyEnabled=true` when installing the Fusion Helm chart.

  ## On-premises private Docker registries

  For on-premises Kubernetes deployments, your organization may not allow Kubernetes to pull [Fusion’s Docker images from DockerHub](https://hub.docker.com/u/lucidworks/). See the instructions below for details on using a private Docker registry with Fusion. These are general instructions that may need to be adapted to work within your organization’s security policies:

  1. Transfer the public images from DockerHub to your private Docker registry.
  2. Establish a workstation that has access to [DockerHub](https://hub.docker.com). This workstation must connect to your internal Docker registry, most likely via VPN connection. In this example, the workstation is referred to as `envoy`.
  3. Install Docker on `envoy`. You need at least 100GB of free disk for Docker.
  4. Pull all of the images from DockerHub to `envoy`’s local registry. For example, to pull the query pipeline image, run `docker pull lucidworks/query-pipeline:5.9.0`. See `docker pull --help` for more information about pulling Docker images.
  5. Establish a connection from `envoy` to the private Docker registry, most likely via a VPN connection. In this example, the private Docker registry is referred to as `<internal-private-registry>`.
  6. Push the images from `envoy`’s Docker registry to the private registry. This will take a long time.
     1. You’ll need to re-tag all images for the internal registry. For example, to tag the query-pipeline image, run:
     ```bash theme={"dark"}
     docker tag lucidworks/query-pipeline:5.9.0 <internal-private-registry>/query-pipeline:5.9.0
     ```
     2. Push each image to the internal repo:
     ```bash theme={"dark"}
     docker push <internal-private-registry>/query-pipeline:5.9.0
     ```
  7. Install the Docker registry secret in Kubernetes. Create the Docker registry secret in the Kubernetes namespace where you want to install Fusion:
     ```bash theme={"dark"}
     SECRET_NAME=<internal-private-secret>
     REPO=<internal-private-registry>

     kubectl create secret docker-registry "${SECRET_NAME}" \
      --namespace "${NAMESPACE}" \
      --docker-server="${REPO}" \
      --docker-username=${REPO_USER} \
      --docker-password=${REPO_PASS} \
      --docker-email=${REPO_USER}
     ```
     For details, see the Kubernetes article [Pull an Image from a Private Registry](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/).
  8. Update the custom values YAML for your cluster to point to your private registry and secret to allow Kubernetes to pull images. For example:
     ```yaml theme={"dark"}
     query-pipeline:
      image:
        imagePullSecrets:
          - name: <internal-private-secret>
        repository: <internal-private-registry>
     ```
     Repeat the process for all Fusion services.

  ### Customize Helm Chart

  Every Fusion service allows you to override the `imagePullSecrets` setting using custom values YAML. However, other 3rd party services--including Zookeeper, Pulsar, Prometheus, and Grafana--don’t allow you to supply the pull secret using the custom values YAML.

  To patch the default service account for your namespace and add the pull secret, run the following:

  ```bash theme={"dark"}
  kubectl patch sa default -n $NAMESPACE \
    -p '"imagePullSecrets": [{"name": "<internal-private-secret>" }]'
  ```

  In Windows using PowerShell or another CLI, you might have to escape the double quotes with a backslash (`\`) or reverse the order of single and double quotes:

  ```bash theme={"dark"}
  kubectl patch sa default -n $NAMESPACE \
    -p "'imagePullSecrets': [{'name': '<internal-private-secret>'}]"
  ```

  <Check>
    Replace `<internal-private-secret>` with the name of the secret you created in the steps above.
  </Check>

  This allows the default service account to pull images from the private registry without specifying the pull secret on the resources directly.

  ## Add additional trusted certificate(s) to Fusion’s indexing and querying services *(optional)*

  You can add custom trusted certificates to support Fusion’s indexing and querying services. You may want to use custom trusted certificates if, for example, you have specific security requirements for data handling or need to support an existing infrastructure and its security needs. This method involves updating your Helm chart.

  If you want to add custom trusted certificates for both the indexing and querying services, follow these instructions twice: once for the indexing service, and once for the querying service. To add different certificates to the indexing and querying services, create one YAML file with the indexing service certificates and one YAML file for the querying service certificates before following these instructions.

  <Tip>
    You may use the same YAML file if you want to use the same certificates for both services.
  </Tip>

  To add custom trusted certificates:

  1. Create a new YAML file for your custom trusted certificates. The `customcerts.yaml` file is the example file in these instructions.

  2. Add the custom certificate(s) in the YAML file created in the previous step. For example:
     ```yaml theme={"dark"}
     trustedCertificates:
      enabled: true
      files:
        some.cert: |-
          -----BEGIN CERTIFICATE-----
          MIIDeTCCAmGgAwIBAgIJAPziuikCTox4MA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
          (...)
          EVA0pmzIzgBg+JIe3PdRy27T0asgQW/F4TY61Yk=
          -----END CERTIFICATE-----
        other.cert: |-
          -----BEGIN CERTIFICATE-----
          MIIDeTCCAmGgAwIBAgIJAPziuikCTox4MA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
          (...)
          EVA0pmzIzgBg+JIe3PdRy27T0asgQW/F4TY61Yk=
          -----END CERTIFICATE---------
     ```

  3. Update the indexing or querying service by running the following Helm command. Replace `EXAMPLE-VALUES-FILE.yaml` with your previous values file.
     ```bash theme={"dark"}
     helm upgrade --install --namespace ${EXAMPLE-NAMESPACE} ${HELM-RELEASE} ${HELM-CHART-PATH} --values EXAMPLE-VALUES-FILE.yaml --values customcerts.yaml
     ```

  4. Verify the indexing or querying pod has a new `init-container` with the name `import-certs`.

  ## Add additional trusted certificate(s) for connectors to allow crawling of web resources with SSL/TLS enabled *(optional)*

  To crawl a datasource which for some reason is using a self-signed certificate, add arbitrary certificates to connectors. For example:

  ```yaml wrap expandable theme={"dark"}
  classic-rest-service:
    trustedCertificates:
      enabled: true
      files:
        some.cert: |-
          -----BEGIN CERTIFICATE-----
          MIIDeTCCAmGgAwIBAgIJAPziuikCTox4MA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
          (...)
          EVA0pmzIzgBg+JIe3PdRy27T0asgQW/F4TY61Yk=
          -----END CERTIFICATE-----
        other.cert: |-
          -----BEGIN CERTIFICATE-----
          MIIDeTCCAmGgAwIBAgIJAPziuikCTox4MA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
          (...)
          EVA0pmzIzgBg+JIe3PdRy27T0asgQW/F4TY61Yk=
          -----END CERTIFICATE---------
  connector-plugin:
    trustedCertificates:
      enabled: true
      files:
        some.cert: |-
          -----BEGIN CERTIFICATE-----
          MIIDeTCCAmGgAwIBAgIJAPziuikCTox4MA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
          (...)
          EVA0pmzIzgBg+JIe3PdRy27T0asgQW/F4TY61Yk=
          -----END CERTIFICATE-----
        other.cert: |-
          -----BEGIN CERTIFICATE-----
          MIIDeTCCAmGgAwIBAgIJAPziuikCTox4MA0GCSqGSIb3DQEBCwUAMGIxCzAJBgNV
          (...)
          EVA0pmzIzgBg+JIe3PdRy27T0asgQW/F4TY61Yk=
          -----END CERTIFICATE---------
  ```

  ### Generating the certificate on linux command line

  Use the following command to generate a `.crt` file in `$fusion_home/apps/jetty/connectors/etc/yourcertname.crt`:

  ```bash theme={"dark"}
  openssl s_client -servername remote.server.net -connect remote.server.net:443 </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' >$fusion_home/apps/jetty/connectors/etc/yourcertname.crt
  ```

  ### Generating the certificate using Firefox web browser

  1. Navigate to the SharePoint host.
  2. Click the <InlineImage src="/assets/images/5.4/survival-guide/icon-firefox-lock.png" /> in the address bar, then click the <InlineImage src="/assets/images/5.4/survival-guide/icon-firefox-right-arrow.png" /> icon.
  3. Next, navigate to **More Information** > **View Certificate** > **Export**.\
     Save the file to the following folder:
     `$fusion_home/apps/jetty/connectors/etc/yourcertname.crt`

  ### Generating the certificate using Chrome web browser

  1. Navigate to **Chrome menu** > **More Tools** > **Developer Tools** > **Security Tab**.
     This will display the **Security overview**.
  2. Click the **View certificate** button.
  3. Save the file to the following folder:

  `$fusion_home/apps/jetty/connectors/etc/yourcertname.crt`

  ### Generating the certificate using powershell

  Use the following script to generate a `.crt`` file in `\$fusion\_home\apps\jetty\connectors\etc\yourcertname.crt\`\`:

  ```bash theme={"dark"}
  $fusion_home = c:\your\fusion\install\directory
  $webRequest = [Net.WebRequest]::Create("https://your-hostname")
  try { $webRequest.GetResponse() } catch {}
  $cert = $webRequest.ServicePoint.Certificate
  $bytes = $cert.Export([Security.Cryptography.X509Certificates.X509ContentType]::Cert)
  set-content -value $bytes -encoding byte -path "$fusion_home\apps\jetty\connectors\etc\yourcertname.binary.crt"
  certutil -encode "$fusion_home\apps\jetty\connectors\etc\yourcertname.binary.crt" "$fusion_home\apps\jetty\connectors\etc\yourcertname.crt"
  rm "$fusion_home\apps\jetty\connectors\etc\yourcertname.binary.crt" -f
  ```

  ## Install Fusion 5 on Kubernetes

  At this point, you’re ready to install Fusion 5 using the custom values YAML files and upgrade script. If you used the `customize_fusion_values.sh` script, run it using BASH:

  ```bash theme={"dark"}
  ./gke_search_f5_upgrade_fusion.sh
  ```

  Once the installation is complete, verify your Fusion installation is running correctly.

  ## Monitoring Fusion with Prometheus and Grafana

  Lucidworks recommends using Prometheus and Grafana for monitoring the performance and health of your Fusion cluster. Your operations team may already have these services installed. If not, install them into the Fusion namespace.

  <Note>
    The [Custom values YAML file shown above](#custom-values-yaml-file) activates the Solr metrics exporter service and adds pod annotations so Prometheus can scrape metrics from Fusion services.
  </Note>

  1. Run the `customize_fusion_values.sh` script with the `--prometheus true` option. This creates an extra custom values YAML file for installing Prometheus and Grafana, `<provider>_<cluster>_<namespace>_monitoring_values.yaml`. For example: `gke_search_f5_monitoring_values.yaml`.
  2. Commit the YAML file to version control.
  3. Review its contents to ensure that the settings suit your needs. For example, decide how long you want to keep metrics. The default is 36 hours.\
     See the [Prometheus documentation](https://github.com/helm/charts/tree/master/stable/prometheus) and [Grafana documentation](https://github.com/helm/charts/tree/master/stable/grafana) for details.
  4. Run the `install_prom.sh` script to install Prometheus & Grafana in your namespace. Include the provider, cluster name, namespace, and helm release as in the example below:
     ```bash theme={"dark"}
     ./install_prom.sh --provider gke -c search -n f5 -r 5-5-1
     ```
     <Tip>   Pass the `--help` parameter to see script usage details.</Tip>
     The Grafana dashboards from [monitoring/grafana](https://github.com/lucidworks/fusion-cloud-native/tree/master/monitoring/grafana) are installed automatically by the `install_prom.sh` script.
</Accordion>

## "Day two" operations

* **Configure Grafana, Prometheus, Promtail, and Loki in Fusion**
* **Configure Pod Affinity**
* **Configure Resource Limits**
* **Spark Operations**
* **Fusion 5 Upgrades**

<AccordionGroup>
  <Accordion title="Configure Grafana, Prometheus, Promtail, and Loki in Fusion">
    <Warning>
      Before you perform these installation instructions, you must delete any existing persistent volume claims (PVCs) related to Prometheus, Grafana, Promtail, and Loki on your namespace.
    </Warning>

    ## Clone the `fusion-cloud-native` repository

    Open a terminal window and run following command:

    ```
    git clone https://github.com/lucidworks/fusion-cloud-native.git
    ```

    ## Install Grafana

    1. In your local `fusion-cloud-native` repository, run the following command for your `<cluster>` and `<namespace>`:

       ```
       ./install-prom.sh -c <cluster> -n <namespace>
       ```

       The following is a sample output. The errors are related to resource limits on the sample cluster, and can be ignored. Similar errors may display for your cluster, and do not impact Grafana logging.

       ```
       Adding the stable chart repo to helm repo list
       "prometheus-community" already exists with the same configuration, skipping
       "grafana" already exists with the same configuration, skipping

       Installing Prometheus and Grafana for monitoring Fusion metrics ... this can take a few minutes.

       Hang tight while we grab the latest from your chart repositories...
       ...Successfully got an update from the "ckotzbauer" chart repository
       ...Successfully got an update from the "lucidworks" chart repository
       ...Successfully got an update from the "grafana" chart repository
       ...Successfully got an update from the "prometheus-community" chart repository
       Update Complete. ⎈Happy Helming!⎈
       Saving 2 charts
       Downloading prometheus from repo https://prometheus-community.github.io/helm-charts
       Downloading grafana from repo https://grafana.github.io/helm-charts
       Deleting outdated charts
       Release "fe-foundry-monitoring" does not exist. Installing it now.
       Error: context deadline exceeded


       Successfully installed Prometheus and Grafana into the fe-foundry namespace.

       NAME                 	NAMESPACE 	REVISION	UPDATED                             	STATUS  	CHART                  	APP VERSION
       fe-foundry           	fe-foundry	11      	2023-08-07 15:10:55.373825 -0700 PDT	deployed	fusion-5.8.0           	5.8.0
       fe-foundry-jupyter   	fe-foundry	2       	2023-07-20 11:29:38.481329 -0700 PDT	deployed	fusion-jupyter-0.2.5   	1.0
       fe-foundry-monitoring	fe-foundry	1       	2023-08-10 11:41:06.113257 -0700 PDT	failed  	fusion-monitoring-1.0.1	1.0.1
       ```
    2. Using the Grafana service endpoint in the newly-installed Grafana helm release, run the following command:

       ```
       kubectl get services
       ```

       The following is a sample output.

       ```
       NAME          TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
       grafana       LoadBalancer   <IP Address>   <IP Address>  3000:32589/TCP    87m
       ```

       If the output does not display, run the following command to expose Grafana, including an `EXTERNAL_IP` for your Grafana LoadBalancer service:

       ```
       kubectl expose deployment <grafana-deployment-name> --type=LoadBalancer --name=grafana --port=3000 --target-port=3000
       ```

    ## Install Loki

    To obtain Loki from the helm chart repository, run the following command for the unique `<loki-release-name>` for your cluster:

    ```
    helm upgrade --install <loki-release-name> --namespace=<namespace> grafana/loki-stack
    ```

    If you do no enter the `<loki-release-name>` correctly, an error similar to the following displays:

    ```
    Error: rendered manifests contain a resource that already exists. Unable to continue with install: PodSecurityPolicy "loki" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "fe-foundry": current value is "ps-intl".
    ```

    If the helm upgrade is successful, the following is a sample output.

    ```
    Release "fe-foundry-loki" does not exist. Installing it now.
    W0810 11:47:07.890370   39624 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
    W0810 11:47:09.396246   39624 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
    NAME: fe-foundry-loki
    LAST DEPLOYED: Thu Aug 10 11:47:07 2023
    NAMESPACE: fe-foundry
    STATUS: deployed
    REVISION: 1
    NOTES:
    The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.

    See http://docs.grafana.org/features/datasources/loki/ for more detail.
    ```

    ## Obtain Admin credentials for Grafana

    1. After you validate Grafana is running by accessing `<EXTERNAL-IP>:3000`, run the following command to obtain an `<admin_password>` for your Grafana instance:

       ```
       kubectl get secret --namespace <namespace> <release_name>-monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
       ```
    2. Sign in to Grafana and change the password for security purposes.
    3. Run the following command to display the promtail pods that are running:

       ```
       kubectl get pods | grep -i promtail | nl
       ```

       Promtail pods must match the number of Kubernetes nodes since an instance of Promtail runs on each node.

    ## Add the Loki datsource

    1. Sign in to Grafana and in the toolbar, click the arrow below Home to display all of the options.
    2. In the Configuration section, click **Data sources**.
    3. Click **Add new data source**.
    4. In the search bar for the data source, enter **Loki**.
    5. In the URL field on the Settings screen, enter your unique `<loki-release-name:port>`. The default port for Loki is `3100`.

       <Note>   If you encounter issues with the `<loki-release-name:port>` information, open a terminal and run `kubectl get services | grep loki` to display a list of every service with a name that contains `loki` along with its associated IP address and port.</Note>
    6. Complete the other fields and click **Save & test**.
  </Accordion>

  <Accordion title="Configure Pod Affinity">
    Affinity rules govern how Kubernetes schedules pods for Fusion components across the cluster. All components have the same affinity setup, which follows this logic:

    * When scheduling, prefer to put a pod on a node in an availability zone that doesn’t have a running instance of this component.
    * Require that pods are deployed on a host that doesn’t have a running instance of the component that is being scheduled.

    With this logic, the loss of a host can only bring down one component at most. However, the cluster must be at least as large as the number of replicas in the largest deployment.

    To run a large number of a component, consider relaxing the "required" policy by changing it to a "preferred" policy on the hostname for the `kubernetes.io/hostname` policies:

    |        |                                                  |
    | ------ | ------------------------------------------------ |
    | Before | requiredDuringSchedulingIgnoredDuringExecution:  |
    | After  | preferredDuringSchedulingIgnoredDuringExecution: |

    If you used the `--with-affinity-rules` option when running the `./customize_fusion_values.sh` script, the pod affinity rules are configured for your cluster. Alternatively, copy [`affinity.yaml`](https://github.com/lucidworks/fusion-cloud-native/blob/master/example-values/affinity.yaml), and rename it using the following naming convention: `<provider>_<cluster>_<release>_fusion_affinity.yaml`.

    To implement the file, append the following to your upgrade script:

    ```
    MY_VALUES="${MY_VALUES} --values gke_search_f5_fusion_affinity.yaml"
    ```
  </Accordion>

  <Accordion title="Configure Resource Limits">
    Determining resource limits is a process that should take place after the initial setup of your cluster. This is especially true for proof-of-concept clusters. Once you have configured your cluster to accommodate a critical mass of data, tune the resource limits for your Fusion services.

    In the case of production or production-like environments, define resource limits to help Kubernetes schedule pods correctly across the nodes in your cluster. This is important for Kubernetes clusters that host namespaces other than Fusion.

    If you used the `--with-resource-limits` option when running the `./customize_fusion_values.sh` script, resource limits are already configured for your cluster. The script creates a YAML file for this purpose named `<provider>_<cluster>_<namespace>_fusion_resources.yaml`. Alternatively, you can copy [`resources.yaml`](https://github.com/lucidworks/fusion-cloud-native/blob/master/example-values/resources.yaml), and rename it using the following naming convention: `<provider>_<cluster>_<release>_fusion_resources.yaml`.

    You can refine the resource requests and limits as you test your cluster’s behavior, while preparing for a production environment with Fusion.
  </Accordion>

  <Accordion title="Spark Operations">
    ### Node Selectors

    You can control which nodes Spark executors are scheduled on using a Spark configuration property for a job:

    ```
    spark.kubernetes.node.selector.<LABEL>=<LABEL_VALUE>
    ```

    Use the `LABEL` specified for the node, and the name of the node as the `LABEL_VALUE`. For example, if a node is labeled with `fusion_node_type=spark_only`, schedule Spark executor pods to run on that node using:

    ```
    spark.kubernetes.node.selector.fusion_node_type=spark_only
    ```

    <Tip>In Fusion 5.5, Spark version 2.4.x does not support tolerations for Spark pods. As a result, Spark pods can’t be scheduled on any nodes with taints in Fusion 5.5.</Tip>

    ### Cluster mode

    Fusion 5 ships with Spark and operates in "cluster mode" on top of Kubernetes. In cluster mode, each Spark driver runs in a separate pod, and resources can be managed per job. Each executor also runs in its own pod.

    ### Spark config defaults

    The table below shows the default configurations for Spark. These settings are configured in the job-launcher config map, accessible using `kubectl get configmaps <release-name>-job-launcher`. Some of these settings are also configurable via Helm.

    **Spark Resource Configurations**

    | Spark Configuration                     | Default value | Helm Variable     |
    | --------------------------------------- | ------------- | ----------------- |
    | spark.driver.memory                     | 3g            |                   |
    | spark.executor.instances                | 2             | executorInstances |
    | spark.executor.memory                   | 3g            |                   |
    | spark.executor.cores                    | 6             |                   |
    | spark.kubernetes.executor.request.cores | 3             |                   |
    | spark.sql.caseSensitive                 | true          |                   |

    **Spark Kubernetes Configurations**

    | Spark Configuration                                     | Default value                                   | Helm Variable          |
    | ------------------------------------------------------- | ----------------------------------------------- | ---------------------- |
    | spark.kubernetes.container.image.pullPolicy             | Always                                          | image.imagePullPolicy  |
    | spark.kubernetes.container.image.pullSecrets            | \[artifactory]                                  | image.imagePullSecrets |
    | spark.kubernetes.authenticate.driver.serviceAccountName | \<name>-job-launcher-spark                      |                        |
    | spark.kubernetes.driver.container.image                 | fusion-dev-docker.ci-artifactory.lucidworks.com | image.repository       |
    | spark.kubernetes.executor.container.image               | fusion-dev-docker.ci-artifactory.lucidworks.com | image.repository       |
  </Accordion>

  <Accordion title="Fusion 5 Upgrades">
    This guide describes how to perform Fusion 5 upgrades.

    <Note>Before upgrading, be aware of changes by checking for [Deprecations and Removals](/docs/5/fusion/deprecations-and-removals) between versions.</Note>

    Lucidworks recommends upgrading to the next minor version only. For example, you should upgrade from Fusion 5.6.1 to Fusion 5.7.1 before upgrading to Fusion 5.8.0.

    The [general upgrade process](#general-upgrade-process) is described in this article. However, the specific upgrade procedures may vary depending on your upgrade path. For the most accurate instructions, please refer to the upgrade article specific to your upgrade.

    {/* // tag::general-process[] */}

    {/* [#general-upgrade-process] */}

    ## General upgrade process

    Fusion natively supports deployments on supported Kubernetes platforms, including AKS, EKS, and GKE.

    Fusion includes an upgrade script for AKS, EKS, and GKE. This script is not generated for other Kubernetes deployments.

    Upgrades differ from platform to platform. See below for more information about upgrading on your platform of choice.

    Whenever you upgrade Fusion, you must also update your [remote connectors](/docs/fusion-connectors/developers/remote-v2-connectors), if you are running any.
    You can download the latest files at [V2 Connectors Downloads](/docs/fusion-connectors/downloads/v2-connectors-downloads).

    ### Natively supported deployment upgrades

    | Deployment type                             | Platform |
    | ------------------------------------------- | -------- |
    | **Azure Kubernetes Service (AKS)**          | `aks`    |
    | **Amazon Elastic Kubernetes Service (EKS)** | `eks`    |
    | **Google Kubernetes Engine (GKE)**          | `gke`    |

    Fusion includes upgrade scripts for natively supported deployment types. To upgrade:

    1. Open the `<platform>_<cluster>_<release>_upgrade_fusion.sh` upgrade script file for editing.
    2. Update the `CHART_VERSION` to your target Fusion version, and save your changes.
    3. Run the `<platform>_<cluster>_<release>_upgrade_fusion.sh` script. The `<release>` value is the same as your namespace, unless you overrode the default value using the `-r` option.

    {/* // During installation, Fusion creates a YAML file that is used to customize Fusion settings in future upgrades: `<platform>_<cluster>_<release>_fusion_values.yaml`. */}

    After running the upgrade, use `kubectl get pods` to see the changes applied to your cluster. It may take several minutes to perform the upgrade, as new Docker images are pulled from DockerHub. To see the versions of running pods, do:

    ```
    kubectl get po -o jsonpath='{..image}'  | tr -s '[[:space:]]' '\n' | sort | uniq
    ```

    {/* // end::general-process[] */}

    {/* [#other-kube-upgrades] */}

    ### Other Kubernetes deployment upgrades

    To update an existing installation, do:

    ```
    RELEASE=f5
    NAMESPACE=default
    helm repo update
    helm upgrade ${RELEASE} "lucidworks/fusion" --namespace "${NAMESPACE}" --values "${MY_VALUES}"
    ```

    Except for ZooKeeper, all K8s deployments and statefulsets use a `RollingUpdate` update policy:

    ```yaml theme={"dark"}
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
    ```

    ZooKeeper instances use `OnDelete` to avoid changing critical stateful pods in the Fusion deployment. To apply changes to Zookeeper after performing the upgrade (uncommon), you need to manually delete the pods. For example:

    ```
    kubectl delete pod f5-zookeeper-0
    ```

    <Check>Delete one pod at a time. Verify the new pod is healthy and serving traffic, before deleting the next healthy pod.</Check>

    You can also set the `updateStrategy` under the `zookeeper` section in your `"${MY_VALUES}"` file:

    ```yaml theme={"dark"}
    solr:
      ...  
        zookeeper:
        updateStrategy:
          type: "RollingUpdate"
    ```

    #### Upgrades with Helm v3

    {/* // tag::upgrades-hemlv3[] */}

    One of the most powerful features provided by Kubernetes and a cloud-native microservices architecture is the ability to do a rolling update on a live cluster. For example, Fusion 5 allows customers to upgrade from Fusion 5.1.0 to a later 5.x.y version on a live cluster with zero downtime or disruption of service.

    When Kubernetes performs a rolling update to an individual microservice, there is a mix of old and new services in the cluster. Requests from other services route to both versions.

    <Note>Lucidworks ensures all changes we make to our service do not break the API interface exposed to other services in the same minor release version (5.x). We also ensure that the stored configuration remains compatible in the same minor release version.</Note>

    Lucidworks releases minor updates to individual services frequently. Pull in those upgrades using Helm at your discretion.

    **How to upgrade Fusion**

    1. Clone the [**fusion-cloud-native** repo](https://github.com/lucidworks/fusion-cloud-native), if you haven’t already.
    2. Locate the `setup_f5_<platform>.sh` script that matches your Kubernetes platform.
    3. Run the script with the `--upgrade` option.

       <Tip>   To see what would be upgraded, pass the `--dry-run` option to the script.</Tip>

    The scripts in the **fusion-cloud-native** repo automatically pull in the latest chart updates from our Helm repository and deploy any updates needed by doing a diff of your current installation and the latest release from Lucidworks.

    {/* // end::upgrades-hemlv3[] */}

    #### Helm upgrade script

    {/* // tag::upgrade-script[] */}

    Once you deploy a working cluster, use the upgrade script created by the `customize_fusion_values.sh` script. The upgrade script hard-codes the parameters and eases the need to remember which parameters to pass to the script. This is helpful when working with multiple K8s clusters. Make sure you check the script into version control alongside your custom values YAML files.

    Whenever you change the custom values YAML files for your cluster, you need to run the upgrade script to apply the changes. The script calls `helm upgrade` with the correct parameters and `--values` options.

    <Warning>If you run `helm upgrade` without passing the custom values YAML files, the deployment will revert to using chart defaults, which you never want to do.</Warning>

    <Tip>The script assumes your `kubeconfig` is pointing to the correct cluster and you’re using Heml v3. If not, the upgrade fails. Select the correct `kubeconfig` before running the script.</Tip>

    {/* // end::upgrade-script[] */}
  </Accordion>
</AccordionGroup>
