Kubernetes

Kubernetes GPU Container

JungGwig 2019. 10. 2. 15:59

 

1. Machine에 nvidia-driver install

2019/10/01 - [Machine Learning] - CentOS 7 Nvidia Driver install

 

2. Kubernetes nvidia-device-plugin deployment

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

  2.1 Troubleshooting

    Error : your GPU is too old to support healthchecking

    (1) wget https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml   

    (2) vim nvidia-device-plugin.yml

    (3) spec:  template:  spec:  containers:  label에 아래 코드 추가

env:
-  name: DP_DISABLE_HEALTHCHECKS
   value: "xids"

    (4) kubectl create -f nvidia-device-plugin.yml

 

 

3. gpu-demo Pod을 통해 GPU Container 배포 가능 여부 확인

apiVersion: v1
kind: Pod
metadata:
  name: gpu-demo
spec:
  containers:
  -  name: gpu-demo
     image: docker.io/nvidia/cuda
     args: ["sh", "-c", "nvidia-smi && tail -f /dev/null"]
     resources:
       limits:
         nvidia.com/gpu: 1
         
// gpu-demo-pod.yml

4. kubectl logs gpu-demo를 통해 pod 정상 동작 확인