> ## Documentation Index
> Fetch the complete documentation index at: https://doc.lucidworks.com/llms.txt
> Use this file to discover all available pages before exploring further.

# GPU Support for Deep Learning Training

export const LwTemplate = ({title = "Key questions to get you started", icon = "sparkles", cta = "Powered by Agent Studio", linkHref = "https://lucidworks.com/demo/?utm_source=docs&utm_medium=referral&utm_campaign=docs_cta_ai"}) => {
  const [isLoaded, setIsLoaded] = useState(false);
  useEffect(() => {
    const timer = setTimeout(() => {
      setIsLoaded(true);
    }, 500);
    return () => clearTimeout(timer);
  }, []);
  return <div className="lw-template-container">
      <Card title={title} icon={icon}>
        {isLoaded && <span dangerouslySetInnerHTML={{
    __html: `<lw-template id="a029c1a9-28be-427e-b0e1-5d918920246a"></lw-template
            >`
  }} />}
        <Link href={linkHref} className="agent-studio-link text-left text-gray-600 gap-2 dark:text-gray-400 text-sm font-medium flex flex-row items-center hover:text-primary dark:hover:text-primary-light group-hover:text-primary group-hover:dark:text-primary-light">Powered by Lucidworks Agent Studio</Link>
      </Card>
    </div>;
};

[localhost link]: http://localhost:3000/docs/5/fusion/operations/survival-guide/gpu-support-deep-learning

[mintlify link]: https://doc.lucidworks.com/docs/5/fusion/operations/survival-guide/gpu-support-deep-learning

[old doc.lw link]: https://doc.lucidworks.com/fusion/5.9/dmyrqm

In Fusion 5.3 and later, jobs that involve training deep learning-based models automatically use GPU resources for training if deployed on a GPU-enabled node. Your cloud provider likely has a custom set of nodeSelectors and tolerations required to map these jobs to their GPU compute. The following section provides an example using Smart Answers and GKE.

<LwTemplate />

### Training Smart Answers on GPU with GKE

To use GPU resources within GKE for Smart Answers training, create a GPU resource within your cluster.

Create a new nodepool with a pre-emptible GPU node that will spin down when not in use. Give the nodepool a label of `node_pool:gpu`. By default, GKE will also add a taint of `nvidia.com/gpu:present=NoSchedule`. Consider that fact when updating your Helm chart values.

Additionally, add a specific resource limit of `nvidia.com/gpu: 1 `. (This value is specific to GKE.) Create another standard nodepool without GPU resources with a label of `node_pool:deploy` for the eventual Seldon Core deployment.

In your custom values YAML file, add:

```yaml wrap  expandable  theme={"dark"}
question-answering:
  nodeSelector:
    default:
      cloud.google.com/gke-nodepool: gpu-nodepool
    supervised:
      seldon:
        cloud.google.com/gke-nodepool: cpu-nodepool
    coldstart:
      seldon:
        cloud.google.com/gke-nodepool: cpu-nodepool
  tolerations:
    default:
      - key: "nvidia.com/gpu"
        operator: "Equal"
        value: "present"
        effect: "NoSchedule"
    supervised:
      seldon: []
    coldstart:
      seldon: []
  resources:
    supervised:
      train:
        requests:
          nvidia.com/gpu: 1
        limits:
          nvidia.com/gpu: 1
    coldstart:
      train:
        requests:
          nvidia.com/gpu: 1
        limits:
          nvidia.com/gpu: 1
```

<Note>
  This setup deploys all workflow steps onto the GPU node *except* for the Seldon Core deployment. As the deployment will live on after the workflow has completed, assigning the Seldon Core deployment to the GPU node it would prevent GKE from spinning the GPU node down. This increases operating expense.
</Note>

### Setting up Milvus on GPU with GKE

Setting up Milvus on GPU first requires the creation of a GPU resource.

At the end of the `ml-model-service` section of your custom values YAML file, add a section for Milvus as shown below:

```yaml wrap  expandable  theme={"dark"}
ml-model-service:
# ml-model-service yaml settings:
# ...
# followed by the Milvus settings:
  milvus:
    gpu:
      enabled: true
    image:
      repository: milvusdb/milvus
      tag: 0.10.2-gpu-d081520-8a2393
      pullPolicy: "IfNotPresent"
      resources:
        requests:
          nvidia.com/gpu: 1
        limits:
          nvidia.com/gpu: 1
    nodeSelector:
      cloud.google.com/gke-nodepool: gpu-nodepool
    tolerations:
      - key: "nvidia.com/gpu"
        operator: "Equal"
        value: "present"
        effect: "NoSchedule"
```

The example above assumes the name of the GPU nodepool is `gpu-nodepool`.

<Note>
  The taints/tolerations and resource keys shown below are for a GKE setup. These values may vary depending on your cloud provider.
</Note>
