[WIP] How to setup a production-grade Kubernetes GPU cluster on Paperspace in 10 minutes for $10 🌈

note: This guide accompanies an upcoming blog post here and is heavily derived from the fantastic guide here

Now, download and execute the initialization script which will set up the Kubernetes master. It is very helpful to go through this short (<40 LOC) script to see what it is doing. At a high level, it downloads Kubernetes, Docker, and a few required packages, installs them, and initiates the Kubernetes process using the kubeadm tool.

Note: Because we are building our cluster on an isolated private network we can safely assume that all nodes can talk to one another, but are not yet publicly addressable

wget https://raw.githubusercontent.com/Paperspace/GPU-Kubernetes-Guide/master/scripts/init-master.sh
chmod +x init-master.sh
sudo ./init-master.sh

This will return a join command in the format kubeadm join xxxx:6443 --token u328wq.xxxxx --discovery-token-ca-cert-hash sha256:xxxxx. Copy those parameter values for joining new nodes. You can regenerate a new join command with kubeadm token create --print-join-command.

Step 5: Prepare GPU worker node

Yes! We have a Kubernetes master node up and running. The next step is to add a GPU-backed worker node and join it to the network. Luckily we have a script for this too (but again, it is a really good practice to go through the script to see what it is doing).

First, create a Paperspace GPU+ instance running Ubuntu 16.04 and make sure it is on your private network. We could use the ML-in-a-box template for this, but really we only need the NVIDIA driver and CUDA installed which our script will download and install for us. This worker node will be used to run GPU-backed Docker containers that are assigned to it by the Kubernetes master node.

Execute the initialization script with the correct parameter values that you copied from above. (ProTip: For this type of work it is best to open two browser tabs and have two Paperspace terminals running one for the master and one for the worker).

wget https://raw.githubusercontent.com/Paperspace/GPU-Kubernetes-Guide/master/scripts/init-worker.sh
chmod +x init-worker.sh
sudo ./init-worker.sh <ip:port> <token> <ca cert hash>

Once it has joined, you will need to reboot the machine. The nvidia driver is installed but it needs a reboot to work. If you do not do this then the node will not appear to be GPU-enabled to our Kubernetes cluster.

Step 6: Deploy a Jupyter-notebook with GPU-support

Awesome! We now have a worker and master node joined together to form a bare-bones Kubernetes cluster. You can confirm this on the master node with the following command:

kubectl get nodes

You should see the hostname of your GPU+ paperspace machine (the worker node) on this list.

In Kubernetes language, a deployment is a yaml file which defines an application that you would like to run on your Kubernetes cluster. The deployment defines which Docker container is used and what its features/specs are. Additionally, the yaml file contains a service description which is responsible for assigning an addressable port to the deployment. We are using the Kubernetes NodePort service type which will choose a port to assign to the container and make it available on all worker nodes. For now, all you need to know is that Kubernetes will find our GPU-backed worker node and send the Jupyter notebook to it.

Download the yaml file from this github repo (you could also copy/paste it using Vim, Nano, Emacs, etc)

wget https://raw.githubusercontent.com/Paperspace/GPU-Kubernetes-Guide/master/deployments/tf-jupyter.yaml

Have Kubernetes deploy it:

kubectl apply -f tf-jupyter.yaml

Step 7: Assign a public IP to Worker node

Ok, so this is not a best practice, but it will quickly let us see if everything is working. We will apply a public IP to our worker node. Because this has all been done in our private network nothing is publicly accessible from the outside world.

Step 8: All done! Woooo!

That's it. You have done what very few people have accomplished -- a GPU-backed Kubernetes cluster in just a few minutes. Go to your new public IP address and port and you should now have a Jupyter notebook running!

Now, in Part 2 (coming soon) we will cover adding storage, and building out a real ML pipeline.

Next Steps (Coming Soon)

Part 2: Distributed ML on a GPU cluster
- Using small dataset (MNIST?) and TensorFlow Distributed
Part 3: Real-world use-case
- Kubernetes Persistent Volumes
- Introducing Helm
- Real-dataset (Distributed CNN)
- similar:
  - https://medium.com/intuitionmachine/kubernetes-gpus-tensorflow-8696232862ca
  - https://medium.com/intuitionmachine/gpus-kubernetes-for-deep-learning-part-3-3-automating-tensorflow-deployment-5dc4d5472e91

Acknowledgements

Most of the heavy lifting here came from the guide here: https://github.com/Langhalsdino/Kubernetes-GPU-Guide which was enormously helpful! A huge shoutout to Langhalsdino.

Additional improvements were suggested by azlyth specifically getting some of the networking to work on Paperspace.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
deployments		deployments
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[WIP] How to setup a production-grade Kubernetes GPU cluster on Paperspace in 10 minutes for $10 🌈

Table of Contents

Why Kubernetes

Step 1: Create a Paperspace Account

Step 2: Create a private network

Step 3: Prepare master node (CPU)

Step 4: Install Kubernetes on the master node (CPU)

Step 5: Prepare GPU worker node

Step 6: Deploy a Jupyter-notebook with GPU-support

Step 7: Assign a public IP to Worker node

Step 8: All done! Woooo!

Next Steps (Coming Soon)

Part 2: Distributed ML on a GPU cluster

Part 3: Real-world use-case

Acknowledgements

License

About

Releases

Packages

Contributors 2

Languages

Paperspace/GPU-Kubernetes-Guide

Folders and files

Latest commit

History

Repository files navigation

[WIP] How to setup a production-grade Kubernetes GPU cluster on Paperspace in 10 minutes for $10 🌈

Table of Contents

Why Kubernetes

Step 1: Create a Paperspace Account

Step 2: Create a private network

Step 3: Prepare master node (CPU)

Step 4: Install Kubernetes on the master node (CPU)

Step 5: Prepare GPU worker node

Step 6: Deploy a Jupyter-notebook with GPU-support

Step 7: Assign a public IP to Worker node

Step 8: All done! Woooo!

Next Steps (Coming Soon)

Part 2: Distributed ML on a GPU cluster

Part 3: Real-world use-case

Acknowledgements

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages