Kubernetes
Kubernetes GPU Container
JungGwig
2019. 10. 2. 15:59
1. Machine에 nvidia-driver install
2019/10/01 - [Machine Learning] - CentOS 7 Nvidia Driver install
2. Kubernetes nvidia-device-plugin deployment
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml
2.1 Troubleshooting
Error : your GPU is too old to support healthchecking
(1) wget https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml
(2) vim nvidia-device-plugin.yml
(3) spec: template: spec: containers: label에 아래 코드 추가
env:
- name: DP_DISABLE_HEALTHCHECKS
value: "xids"
(4) kubectl create -f nvidia-device-plugin.yml
3. gpu-demo Pod을 통해 GPU Container 배포 가능 여부 확인
apiVersion: v1
kind: Pod
metadata:
name: gpu-demo
spec:
containers:
- name: gpu-demo
image: docker.io/nvidia/cuda
args: ["sh", "-c", "nvidia-smi && tail -f /dev/null"]
resources:
limits:
nvidia.com/gpu: 1
// gpu-demo-pod.yml
4. kubectl logs gpu-demo를 통해 pod 정상 동작 확인