-
Notifications
You must be signed in to change notification settings - Fork 1
Working with the smew servers
This page provides some basic documentation on how to use the group's smew servers, which are integrated into the server network of the Department of Statistics.
We have two servers, called smew01
and smew02
, each with the following specifications:
- 36 CPUs;
- 375GB of RAM;
- 1TB of disk space.
To access these servers, you must first log in to the department's gateway server using the following command (see this page on the Statistics website for more info):
ssh -l username gate.stats.ox.ac.uk
Once in the gateway server, you must then log in to one of the Slurm head nodes (servers used to submit jobs to compute servers using the Slurm workload manager). To access the greytail
head node, for example, simply run:
ssh greytail
Your prompt should now have changed from blackcap2{username}%
to greytail{username}%
.
As of this writing in Feb 2023, there is no shared storage system for the departmental servers, although one is being prepared. This means that to access the storage of either of the smew servers, you must submit a job to run on that server.
If you want to quickly browse the storage on either smew01
or smew02
, you can submit a 'debug' job (which has a maximum duration of 30min) as follows:
srun --pty -t 0:30:0 -M swan --partition=smew01-debug bash
(Replace smew01
with smew02
for accessing the latter server.) Once you do this, you may find that your prompt has not changed and remains greytail{username}%
. However, if you now run hostname
you should see smew01.cpu.stats.ox.ac.uk
which means that you are now running an interactive Bash session on smew01
and can browse the system's storage as usual.
If you wish to do some computational work on one of the servers interactively (meaning that you have access to a command-line prompt and can run commands in real time without pre-specifying them in a script), you can submit an interactive job to the interactive-cpu
partition with Slurm.
Note: this is available only for smew01
. Stuart McRobert from IT set up this workflow in the summer of 2022 for Simon's MSc students. If it would be useful to have an equivalent setup for smew02
he would probably be the person to contact in the IT office.
Interactive jobs have a maximum duration of 12 hours. You may request up to the maximum number of CPUs and RAM available on the server. For example, to submit a 12-hour job using 4 CPUs and 64GB of RAM in total, run the following command:
srun --pty -t 12:00:00 -M swan --partition=interactive-cpu -n 4 --mem=64GB bash
For longer jobs, you can submit a 'batch' job with Slurm using the sbatch
command (official documentation here) rather than srun
. See this page on the internal (SSO login required) Stats IT pages for a detailed description of how to use Slurm on the department's servers.
As a basic example, here is a Bash script which submits a 5min job to smew02
requesting 2 CPUs and 24GB of RAM:
#!/bin/bash
#SBATCH --cluster=swan
#SBATCH --partition=smew01-cpu
#SBATCH --time=0:05:00
#SBATCH --cpus-per-task=2
#SBATCH --mem=24GB
#SBATCH --job-name="test"
#SBATCH --output=test.out
{Bash commands here}
The lack of a shared storage system and the fact that the compute nodes are not externally accessible via ssh
makes copying files to/from the smew servers non-trivial.
A shared filesystem accessible from all departmental Linux systems (including smew!) is now available at /vols/bitbucket/
. This was set up in December 2022 and is at an 'early access' stage and still under development.
Your personal space in this file system is in /vols/bitbucket/username/
. You can copy a file to this system from your local machine with a command like the following (rsync is also reported to work):
scp myfile.txt [email protected]:/vols/bitbucket/username/
Copying a file from the 'bitbucket' to your local machine is very similar:
scp [email protected]:/vols/bitbucket/username/myfile.txt ./
Finally, you can access the in this directory from either smew01
or smew02
simply at /vols/bitbucket/username/myfile.txt
.
The procedures below are more cumbersome but were needed before the new 'bitbucket' became available. We keep these instructions here for future reference.
Suppose you have a file ~/myfile.txt
in your home directory on your local machine (e.g. a laptop). You can copy this file to your Statistics home directory by sending it via scp
to the gateway server:
scp ~/myfile.txt [email protected]:~/
You should now be able to find your file in the /homes/username
directory once you log in to the gateway server, or directly in the storage accessible from you Stats desktop machine.
Once you have the file in your home directory on the Statistics network, you can copy it to smew01
or smew02
by starting an interactive section the server you wish to copy the file to and then using scp
to copy it from your home directory.
First start a 'debug' session:
srun --pty -t 0:30:0 -M swan --partition=smew01-debug bash
Now copy the file:
scp [email protected]:~/myfile.txt ~/
This will place the file in your home directory on smew01
(/homes/username
).
Let's now do the reverse operation and copy the file from smew onto our Stats home directory and finally to our local machine.
First, and while logged in to smew01
or smew02
, let's copy the file to the Stats system:
scp myfile.txt greytail.stats.ox.ac.uk:~/
Now log off the smew and Stats servers and return to your local machine. You can copy the file from your Stats internal directory as follows (note that we must now specify the gateway server as the remote host rather than greytail
):
scp [email protected]:~/myfile.txt ./