Version: Next

6. (Optional) Setup GPU

For using GPU in Kubernetes and Kubeflow, the following tasks are required.

1. Install NVIDIA Driver

If the following screen is output when executing nvidia-smi, please omit this step.

mlops@ubuntu:~$ nvidia-smi 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 25%   32C    P8     4W / 120W |    211MiB /  6078MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   34C    P8     7W / 175W |      5MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1644      G   /usr/lib/xorg/Xorg                198MiB |
|    0   N/A  N/A      1893      G   /usr/bin/gnome-shell               10MiB |
|    1   N/A  N/A      1644      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

If the output of nvidia-smi is not as above, please install the nvidia driver that fits your installed GPU.

If you are not familiar with the installation of nvidia drivers, please install it through the following command.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update && sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo reboot

2. Install NVIDIA-Docker.

Let's install NVIDIA-Docker.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2 &&
sudo systemctl restart docker

To check if it is installed correctly, we will run the docker container using the GPU.

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

If the following message appears, it means that the installation was successful:

mlops@ubuntu:~$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 25%   32C    P8     4W / 120W |    211MiB /  6078MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   34C    P8     6W / 175W |      5MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

3. Setting NVIDIA-Docker as the Default Container Runtime

By default, Kubernetes uses Docker-CE as the default container runtime. To use NVIDIA GPU within Docker containers, you need to configure NVIDIA-Docker as the container runtime and modify the default runtime for creating pods.

Open the /etc/docker/daemon.json file and make the following modifications:

sudo vi /etc/docker/daemon.json

{
  "default-runtime": "nvidia",
  "runtimes": {
      "nvidia": {
          "path": "nvidia-container-runtime",
          "runtimeArgs": []
  }
  }
}

After confirming the file changes, restart Docker.

sudo systemctl daemon-reload
sudo service docker restart

Verify that the changes have been applied.

sudo docker info | grep nvidia

If you see the following message, it means that the installation was successful.

mlops@ubuntu:~$ docker info | grep nvidia
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: nvidia

4. Nvidia-Device-Plugin

Create the nvidia-device-plugin daemonset.

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.10.0/nvidia-device-plugin.yml

Verify that the nvidia-device-plugin pod is in the RUNNING state.
```
kubectl get pod -n kube-system | grep nvidia
```

You should see the following output:

kube-system   nvidia-device-plugin-daemonset-nlqh2   1/1     Running   0    1h

Verify that the nodes have been configured to have GPUs available.
```
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
```
If you see the following message, it means that the configuration was successful.
(In the MLOps for ALL* tutorial cluster, there are two GPUs, so the output is 2. If the output shows the correct number of GPUs for your cluster, it is fine.)
```
NAME       GPU
ubuntu     2
```
If it is not configured, the GPU value will be displayed as <None>.

1. Install NVIDIA Driver​

2. Install NVIDIA-Docker.​

3. Setting NVIDIA-Docker as the Default Container Runtime​

4. Nvidia-Device-Plugin​

1. Install NVIDIA Driver

2. Install NVIDIA-Docker.

3. Setting NVIDIA-Docker as the Default Container Runtime

4. Nvidia-Device-Plugin