-
Kubernetes GPU ContainerKubernetes 2019. 10. 2. 15:59
1. Machine에 nvidia-driver install
2019/10/01 - [Machine Learning] - CentOS 7 Nvidia Driver install
2. Kubernetes nvidia-device-plugin deployment
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml
2.1 Troubleshooting
Error : your GPU is too old to support healthchecking
(1) wget https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml
(2) vim nvidia-device-plugin.yml
(3) spec: template: spec: containers: label에 아래 코드 추가
env: - name: DP_DISABLE_HEALTHCHECKS value: "xids"
(4) kubectl create -f nvidia-device-plugin.yml
3. gpu-demo Pod을 통해 GPU Container 배포 가능 여부 확인
apiVersion: v1 kind: Pod metadata: name: gpu-demo spec: containers: - name: gpu-demo image: docker.io/nvidia/cuda args: ["sh", "-c", "nvidia-smi && tail -f /dev/null"] resources: limits: nvidia.com/gpu: 1 // gpu-demo-pod.yml
4. kubectl logs gpu-demo를 통해 pod 정상 동작 확인
'Kubernetes' 카테고리의 다른 글
Helm Charts (0) 2019.11.01 Kubernetes Persistent Volume (0) 2019.10.08 Kubernetes 특정 Node로 Pod 배포 (0) 2019.10.02 Kubernetes PKI Certificate (0) 2019.10.02 Kubernetes CPU Affinity (0) 2019.10.01