Skip to content

Commit

Permalink
replace submodule
Browse files Browse the repository at this point in the history
  • Loading branch information
adamltyson committed Nov 24, 2023
1 parent b601480 commit cef5473
Show file tree
Hide file tree
Showing 6 changed files with 219 additions and 1 deletion.
1 change: 0 additions & 1 deletion demo
Submodule demo deleted from 732efd
153 changes: 153 additions & 0 deletions demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
## SWC SLURM demo

1. Login to gateway

```bash
ssh [email protected]
```

2. Login to HPC

```
ssh hpc-gw1
```

N.B. You can [set up SSH keys](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2) so you don't have to keep typing in your password.

3. See what nodes are available using `sinfo`.

N.B. Adding this line to your `.bashrc` and typing `sinfol` will print much more information.
```bash
alias sinfol='sinfo --Node --format="%.14N %.4D %5P %.6T %.4c %.10C %4O %8e %.8z %.6m %.8d %.6w %.5f %.6E %.30G"'
```
3. See what jobs are running with `squeue`.

4. See what data storage is available:
* `cd /ceph/store` - general lab data storage
* `cd /nfs/winstor` - older lab data
* `/ceph/neuroinformatics/neuroinformatics` - team storage
* `cd /ceph/zoo` - specific project storage
* `cd /ceph/scratch` - short term storage, e.g. for intermediate analysis results, not backed up
* `cd /ceph/scratch/neuroinformatics-dropoff` - "dropbox" for others to share data with the team

5. Start an interactive job in pseudoterminal mode (`--pty`) by requesting a single core from [SLURM](https://slurm.schedmd.com/documentation.html), the job scheduler:

```bash
srun -p cpu --pty bash -i
```
N.B. Don't run anything intensive on the login nodes.

6. Clone a test script
```bash
cd ~/
git clone https://github.com/neuroinformatics-unit/slurm-demo
```
7. Check out list of available modules
```bash
module avail
```
8. Load the miniconda module
```bash
module load miniconda
```

9. Create conda environment
```bash
conda env create -f env.yml
```

10. Activate conda environment and run Python script
```bash
conda activate slurm_demo
python python multiply.py 5 10 --jazzy
```
11. Stop interactive job
```bash
exit
```
12. Check out batch script:
```bash
cat batch_example.sh
```

```bash
#!/bin/bash

#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 2G # memory pool for all cores
#SBATCH -n 2 # number of cores
#SBATCH -t 0-0:10 # time (D-HH:MM)
#SBATCH -o slurm_output.out
#SBATCH -e slurm_error.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]

module load miniconda
conda activate slurm_demo

for i in {1..5}
do
echo "Multiplying $i by 10"
python multiply.py $i 10 --jazzy
done
```
Run batch job:
```bash
sbatch batch_example.sh
```

13. Check out array script:
```bash
cat array_example.sh
```

```bash
#!/bin/bash

#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 2G # memory pool for all cores
#SBATCH -n 2 # number of cores
#SBATCH -t 0-0:10 # time (D-HH:MM)
#SBATCH -o slurm_array_%A-%a.out
#SBATCH -e slurm_array_%A-%a.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --array=0-9%4

# Array job runs 10 separate jobs, but not more than four at a time.
# This is flexible and the array ID ($SLURM_ARRAY_TASK_ID) can be used in any way.

module load miniconda
conda activate slurm_demo

echo "Multiplying $SLURM_ARRAY_TASK_ID by 10"
python multiply.py $SLURM_ARRAY_TASK_ID 10 --jazzy
```
Run array job:
```bash
sbatch array_example.sh
```

14. Start an interactive job with one GPU:
```bash
srun -p gpu --gres=gpu:1 --pty bash -i
```

14. Load CUDA
```bash
module load cuda
```
15. Activate conda environment check GPU
```bash
conda activate slurm_demo
python
```

```python
import tensorflow as tf
tf.config.list_physical_devices('GPU')
```

16. For fast I/O consider copying data to `/tmp` (fast NVME storage) as part of the run. Available on all of the gpu-380 and gpu-sr670 nodes.
22 changes: 22 additions & 0 deletions demo/array_example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 2G # memory pool for all cores
#SBATCH -n 2 # number of cores
#SBATCH -t 0-0:10 # time (D-HH:MM)
#SBATCH -o slurm_array_%A-%a.out
#SBATCH -e slurm_array_%A-%a.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --array=0-9%4

# Array job runs 10 separate jobs, but not more than four at a time.
# This is flexible and the array ID ($SLURM_ARRAY_TASK_ID) can be used in any way.

module load miniconda
conda activate slurm_demo

echo "Multiplying $SLURM_ARRAY_TASK_ID by 10"
python multiply.py $SLURM_ARRAY_TASK_ID 10 --jazzy

20 changes: 20 additions & 0 deletions demo/batch_example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 2G # memory pool for all cores
#SBATCH -n 2 # number of cores
#SBATCH -t 0-0:10 # time (D-HH:MM)
#SBATCH -o slurm_output.out
#SBATCH -e slurm_error.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]

module load miniconda
conda activate slurm_demo

for i in {1..5}
do
echo "Multiplying $i by 10"
python multiply.py $i 10 --jazzy
done
6 changes: 6 additions & 0 deletions demo/env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name: slurm_demo
dependencies:
- python=3.10
- pip:
- typer[all]
- tensorflow
18 changes: 18 additions & 0 deletions demo/multiply.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import typer


def multiply(first_num: int, second_num: int, jazzy: bool = False):
"""
Multiply two numbers together.
If --jazzy is used, result is announced more grandly.
"""

result = first_num * second_num
if jazzy:
print(f"Behold, the result is: {result}!")
else:
print(result)


if __name__ == "__main__":
typer.run(multiply)

0 comments on commit cef5473

Please sign in to comment.