Build a slurm cluster over machines that do not have a shared file system
- You are on the login node of your cluster, with hostname
login-0
and IP address192.168.100.0
. - There are several compute nodes with hostname
compute-1
,compute-2
, ...,compute-N
. - The local network ips of all the compute nodes are
192.168.100.X
, where X is a number from 1 to N.
TODO
-
In this document, all the compute nodes are assumed to have the same system configuration and can connect to each other within the local network.
-
You have the root privilege on all the machines, so that you can use
pdsh -l root
to run commands in parallel across multiple machines.` -
The working directory, i.e
$PWD
is/home/slurm_install
.
First, we should create a directory on all nodes to store all shared data. As an example, we use /nvme/slurm_share
.
pdsh -w "192.168.100.[0-N]" -l root 'mkdir /nvme/slurm_share'
To enable nfs, preparing 3 files in the working directory:
/etc/hosts
: a file containing hostname and IP address mapping, so that you can use hostnames instead of IP addresses when connecting to other machines./etc/exports
: a file used by the server to specify which directories are accessible from remote hosts./etc/fstab
: a configuration file for the NFS client that tells it where to find the exported filesystems.
the contents of these files are as follows:
# /home/slurm_install/hosts
192.168.100.0 login-0
192.168.100.1 compute-1
192.168.100.2 compute-2
...
192.168.100.N compute-N
# /home/slurm_install/exports
/nvme/slurm_share login-*(rw,no_root_squash)
/nvme/slurm_share compute-*(rw,no_root_squash)
the second column of the fstab file is the mount point of the shared directory on each node. Here we use /nvme/slurm_nfs/${hostname} as the mount point of each node.
# /home/slurm_install/fstab
192.168.100.0:/nvme/slurm_share /nvme/slurm_nfs/login-0 nfs rw,bg,hard,intr,tcp,nfsvers=4,noatime,actimeo=30 0 0
192.168.100.1:/nvme/slurm_share /nvme/slurm_nfs/compute-1 nfs rw,bg,hard,intr,tcp,nfsvers=4,noatime,actimeo=30 0 0
...
192.168.100.N:/nvme/slurm_share /nvme/slurm_nfs/compute-N nfs rw,bg,hard,intr,tcp,nfsvers=4,noatime,actimeo=30 0 0
Then we use the following commands to apply these changes on all nodes:
# fist copy the files from login-0 to other machines
pdcp -w "192.168.100.[1-N]" /home/slurm_install/* root@compute-*:/home/slurm_install/
# then apply the changes by appending the file content to `/etc/hosts`, `/etc/exports` and `/etc/fstab` respectively.
# In case we make a mistake, backup the original files first:
pdcp -w "192.168.100.[1-N]" /etc/hosts root@compute-*:/etc/hosts.backup
pdcp -w "192.168.100.[1-N]" /etc/exports root@compute-*:/etc/exports.backup
pdcp -w "192.168.100.[1-N]" /etc/fstab root@compute-*:/etc/fstab.backup
# then append the file content to these files:
pdsh -w "192.168.100.[0-N]" -l root 'cat /home/slurm_install/hosts >> /etc/hosts'
pdsh -w "192.168.100.[0-N]" -l root 'cat /home/slurm_install/exports >> /etc/exports'
pdsh -w "192.168.100.[0-N]" -l root 'cat /home/slurm_install/fstab >> /etc/fstab'
Now we can mount the shared directory on all nodes:
pdsh -w "192.168.100.[0-N]" -l root 'mkdir -p /nvme/slurm_nfs/${hostname}'
pdsh -w "192.168.100.[0-N]" -l root 'mount -a'
we use mergerfs to combine all the shared directories on different nodes into a single directory, so that we can access them as if they are in the same machine.
# first install mergerfs and fuse-utils
# find the latest version of mergerfs from here: https://github.com/trapexit/mergerfs/releases
# for example, we use version 2.40.2, and our system is centos 7
wget https://github.com/trapexit/mergerfs/releases/download/2.40.2/mergerfs-2.40.2-1.el7.x86_64.rpm
TODO