Deploy Fusion at Scale
setup_f5_*.sh
scripts are handy for getting started and proof-of-concept purposes, this article covers the planning process for building a production-ready environment.gcloud
or aws
, and kubectl
.
See the platform-specific instructions linked above, or check with your cloud provider.-c
arg)fusion-cloud-native
repository: git clone https://github.com/lucidworks/fusion-cloud-native
customize_fusion_values.sh
script.
--help
parameter to see script usage details.File | Description |
---|---|
<provider>_<cluster>_<namespace>_fusion_values.yaml | Main custom values YAML used to override Helm chart defaults for Fusion microservices. |
<provider>_<cluster>_<namespace>_monitoring_values.yaml | Custom values yaml used to configure Prometheus and Grafana. |
<provider>_<cluster>_<namespace>_fusion_resources.yaml | Resource requests and limits for all Microservices. |
<provider>_<cluster>_<namespace>_fusion_affinity.yaml | Pod affinity rules to ensure mulitple replicas for a single service are evenly distributed across zones and nodes. |
<provider>_<cluster>_<namespace>_upgrade_fusion.sh | Script used to install and/or upgrade Fusion using the aforementioned custom values YAML files. |
<provider>_<cluster>_<release>_fusion_values.yaml
file to familiarize yourself with its structure and contents. Notice it contains a separate section for each of the Fusion microservices. The example configuration of the query-pipeline
service below illustrates some important concepts about the custom values YAML file.
<provider>_<cluster>_<namespace>_fusion_values.yaml
. For example, gke_search_f5_fusion_values.yaml
.Parameter | Description |
---|---|
<provider> | The K8s platform you’re running on, such as gke . |
<cluster> | The name of your cluster. |
<namespace> | The K8s namespace where you want to install Fusion. |
<node_selector> | Specifies a nodeSelector label to find nodes to schedule Fusion pods on. |
--node-pool <node_selector>
label is very important. Using the wrong value will cause your pods to be stuck in the pending
state. If you’re not sure about the correct value for your cluster, pass ’’` to let Kubernetes decide which nodes to schedule Fusion pods on.nodeSelector
labels are provider-specific. The fusion-cloud-native
scripts use the following defaults for GKE and EKS:Provider | Default node selector |
---|---|
GKE | cloud.google.com/gke-nodepool: default-pool |
EKS | alpha.eksctl.io/nodegroup-name: standard-workers |
values.yaml
file to avoid a known issue that prevents the kuberay-operator
pod from launching successfully: yaml kuberay-operator: crd: create: true
Flag | Description |
---|---|
--node-pool | Add a Fusion specific label to your nodes. |
--with-resource-limits | Configure resource requests/limits. |
--with-replicas | Configure replica counts. |
--with-affinity-rules | Configure pod affinity rules for Fusion services. |
--node-pool
to add a Fusion specific label to your nodes by doing:--node-pool 'fusion_node_type: <NODE_LABEL>'
.storageClass.yaml
with the following contents:nodePools
property. If any property for that statefulset needs to be changed from the default set of values, then it can be set directly on the object representing the node pool, any properties that are omitted are defaulted to the base value. See the following example (additional whitespace added for display purposes only):""
is the suffix for the default partition.fusion_node_type=analytics
. You can use the fusion_node_type
property in Solr auto-scaling policies to govern replica placement during collection creation.fusion_node_type=search
.nodePools
section above.nodePools
value ""
.replicaCount
, or number of Solr pods, is six. The search partition replicaCount
is twelve.Each nodePool is automatically be assigned the -Dfusion_node_type property of <search>
, <system>
, or <analytics>
. This value matches the name of the nodePool. For example, -Dfusion_node_type=<search>
.The Solr pods have a fusion_node_type
system property, as shown below:--set global.networkPolicyEnabled=true
when installing the Fusion Helm chart.envoy
.envoy
. You need at least 100GB of free disk for Docker.envoy
’s local registry. For example, to pull the query pipeline image, run docker pull lucidworks/query-pipeline:5.9.0
. See docker pull --help
for more information about pulling Docker images.envoy
to the private Docker registry, most likely via a VPN connection. In this example, the private Docker registry is referred to as <internal-private-registry>
.envoy
’s Docker registry to the private registry. This will take a long time.
imagePullSecrets
setting using custom values YAML. However, other 3rd party services—including Zookeeper, Pulsar, Prometheus, and Grafana—don’t allow you to supply the pull secret using the custom values YAML.To patch the default service account for your namespace and add the pull secret, run the following:\
) or reverse the order of single and double quotes:<internal-private-secret>
with the name of the secret you created in the steps above.customcerts.yaml
file is the example file in these instructions.
EXAMPLE-VALUES-FILE.yaml
with your previous values file.
init-container
with the name import-certs
.
.crt
file in $fusion_home/apps/jetty/connectors/etc/yourcertname.crt
:$fusion_home/apps/jetty/connectors/etc/yourcertname.crt
$fusion_home/apps/jetty/connectors/etc/yourcertname.crt
.crt`` file in
$fusion_home\apps\jetty\connectors\etc\yourcertname.crt“:customize_fusion_values.sh
script, run it using BASH:customize_fusion_values.sh
script with the --prometheus true
option. This creates an extra custom values YAML file for installing Prometheus and Grafana, <provider>_<cluster>_<namespace>_monitoring_values.yaml
. For example: gke_search_f5_monitoring_values.yaml
.install_prom.sh
script to install Prometheus & Grafana in your namespace. Include the provider, cluster name, namespace, and helm release as in the example below:
--help
parameter to see script usage details.install_prom.sh
script.Configure Grafana, Prometheus, Promtail, and Loki in Fusion
fusion-cloud-native
repositoryfusion-cloud-native
repository, run the following command for your <cluster>
and <namespace>
:
EXTERNAL_IP
for your Grafana LoadBalancer service:
<loki-release-name>
for your cluster:<loki-release-name>
correctly, an error similar to the following displays:<EXTERNAL-IP>:3000
, run the following command to obtain an <admin_password>
for your Grafana instance:
<loki-release-name:port>
. The default port for Loki is 3100
.
<loki-release-name:port>
information, open a terminal and run kubectl get services | grep loki
to display a list of every service with a name that contains loki
along with its associated IP address and port.Configure Pod Affinity
kubernetes.io/hostname
policies:Before | requiredDuringSchedulingIgnoredDuringExecution: |
After | preferredDuringSchedulingIgnoredDuringExecution: |
--with-affinity-rules
option when running the ./customize_fusion_values.sh
script, the pod affinity rules are configured for your cluster. Alternatively, copy affinity.yaml
, and rename it using the following naming convention: <provider>_<cluster>_<release>_fusion_affinity.yaml
.To implement the file, append the following to your upgrade script:Configure Resource Limits
--with-resource-limits
option when running the ./customize_fusion_values.sh
script, resource limits are already configured for your cluster. The script creates a YAML file for this purpose named <provider>_<cluster>_<namespace>_fusion_resources.yaml
. Alternatively, you can copy resources.yaml
, and rename it using the following naming convention: <provider>_<cluster>_<release>_fusion_resources.yaml
.You can refine the resource requests and limits as you test your cluster’s behavior, while preparing for a production environment with Fusion.Spark Operations
LABEL
specified for the node, and the name of the node as the LABEL_VALUE
. For example, if a node is labeled with fusion_node_type=spark_only
, schedule Spark executor pods to run on that node using:kubectl get configmaps <release-name>-job-launcher
. Some of these settings are also configurable via Helm.Spark Resource ConfigurationsSpark Configuration | Default value | Helm Variable |
---|---|---|
spark.driver.memory | 3g | |
spark.executor.instances | 2 | executorInstances |
spark.executor.memory | 3g | |
spark.executor.cores | 6 | |
spark.kubernetes.executor.request.cores | 3 | |
spark.sql.caseSensitive | true |
Spark Configuration | Default value | Helm Variable |
---|---|---|
spark.kubernetes.container.image.pullPolicy | Always | image.imagePullPolicy |
spark.kubernetes.container.image.pullSecrets | [artifactory] | image.imagePullSecrets |
spark.kubernetes.authenticate.driver.serviceAccountName | <name>-job-launcher-spark | |
spark.kubernetes.driver.container.image | fusion-dev-docker.ci-artifactory.lucidworks.com | image.repository |
spark.kubernetes.executor.container.image | fusion-dev-docker.ci-artifactory.lucidworks.com | image.repository |
Fusion 5 Upgrades
Deployment type | Platform |
---|---|
Azure Kubernetes Service (AKS) | aks |
Amazon Elastic Kubernetes Service (EKS) | eks |
Google Kubernetes Engine (GKE) | gke |
<platform>_<cluster>_<release>_upgrade_fusion.sh
upgrade script file for editing.CHART_VERSION
to your target Fusion version, and save your changes.<platform>_<cluster>_<release>_upgrade_fusion.sh
script. The <release>
value is the same as your namespace, unless you overrode the default value using the -r
option.kubectl get pods
to see the changes applied to your cluster. It may take several minutes to perform the upgrade, as new Docker images are pulled from DockerHub. To see the versions of running pods, do:RollingUpdate
update policy:OnDelete
to avoid changing critical stateful pods in the Fusion deployment. To apply changes to Zookeeper after performing the upgrade (uncommon), you need to manually delete the pods. For example:updateStrategy
under the zookeeper
section in your "${MY_VALUES}"
file:setup_f5_<platform>.sh
script that matches your Kubernetes platform.
--upgrade
option.
--dry-run
option to the script.customize_fusion_values.sh
script. The upgrade script hard-codes the parameters and eases the need to remember which parameters to pass to the script. This is helpful when working with multiple K8s clusters. Make sure you check the script into version control alongside your custom values YAML files.Whenever you change the custom values YAML files for your cluster, you need to run the upgrade script to apply the changes. The script calls helm upgrade
with the correct parameters and --values
options.helm upgrade
without passing the custom values YAML files, the deployment will revert to using chart defaults, which you never want to do.kubeconfig
is pointing to the correct cluster and you’re using Heml v3. If not, the upgrade fails. Select the correct kubeconfig
before running the script.