theevann / kubernetes-setup

MLO group setup for kubernetes cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Instruction of using the container cluster (Kubernetes, k8s)


Requesting access

Use this form to request access (use Accréditation=MLO).

Kubernetes basics

Please refer to this repository for your basic setup.

Running a job

There are two approaches to running pods on the container cluster:

  • Like in the Kubernetes basics, with command: [sleep, infinity], and then connecting to the pod over ssh to run an experiment
    • This can be convenient for playing around. You can temporarily spin up as many nodes as you want
    • But you pay GPU time you don't use.
  • Use something like command: [run, my, experiment].
    • This makes debugging slightly harder, but as soon as your job finishes, the pod gets status Completed, and you (Martin) will stop paying for the pod.

Storage across icclusters (mounting /mlo-container-scratch)

Follow the instructions in Kubernetes basics, and use

volumeMounts:
- mountPath: /scratch
   name: mlo-scratch
   subPath: YOUR_USERNAME

and

volumes:
- name: mlo-scratch
   persistentVolumeClaim:
   claimName: mlo-scratch

Storage across icclusters (mounting /mlodata1)

spec:
  volumes:
  - name: mlodata1
    persistentVolumeClaim:
      claimName: pv-mlodata1
  containers:
  - name:  ubuntu
    volumeMounts:
    - mountPath: /mlodata1
      name: mlodata1

Custom your own docker image

Go to https://ic-registry.epfl.ch and use your gaspar to login in.

There already has a group project named mlo. Please ask the owner of the group project to give you the corresponding permission so that you can push your docker image to that repository.

Once you get the image and have the permission, you can push to the remote host, e.g.,

docker push ic-registry.epfl.ch/mlo/ml:1.0

Some deployment template

You can find some provided templates, e.g.,

Some Tips

  • By default, a Docker container will run as root. This means that the files you write in the shared storage are owned by root. You can solve this by changing the default user in Docker (example from Tao)
  • To avoid the error sudo: no tty present and no askpass program specified, please use sudo -S xxx.

About

MLO group setup for kubernetes cluster