Skip to content

Commit

Permalink
back to local disk, still 11 workers and b256
Browse files Browse the repository at this point in the history
  • Loading branch information
mwalmsley committed Nov 6, 2023
1 parent 666241d commit dece4ab
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions only_for_me/narval/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,8 +126,8 @@
num_workers=11,
random_state=random_state,
learning_rate=1e-3,
# cache_dir=os.environ['SLURM_TMPDIR'] + '/cache'
cache_dir='/tmp/cache'
cache_dir=os.environ['SLURM_TMPDIR'] + '/cache'
# cache_dir='/tmp/cache'
# /tmp for ramdisk (400GB total, vs 4TB total for nvme)
)

Expand Down
4 changes: 2 additions & 2 deletions only_for_me/narval/train.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ nvidia-smi

PYTHON=/home/walml/envs/zoobot39_dev/bin/python

# mkdir $SLURM_TMPDIR/cache
mkdir /tmp/cache
mkdir $SLURM_TMPDIR/cache
# mkdir /tmp/cache

# export NCCL_BLOCKING_WAIT=1 #Set this environment variable if you wish to use the NCCL backend for inter-GPU communication.
# export MASTER_ADDR=$(hostname) #Store the master node’s IP address in the MASTER_ADDR environment variable.
Expand Down

0 comments on commit dece4ab

Please sign in to comment.