Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhance] Support CPU-only train and test in slurm cluster (#189) #191

Closed
wants to merge 3 commits into from

Conversation

GhaSiKey
Copy link

@GhaSiKey GhaSiKey commented Jan 4, 2023

Motivation

Support CPU-only train and test in slurm cluster. (#189)

Modification

Modify slurm check conditions to "flag = partition is not None"

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@ice-tong
Copy link
Collaborator

ice-tong commented Jan 4, 2023

The commands of slurm launch also should be updated if cpu-only~

mim/mim/commands/train.py

Lines 252 to 257 in ff137cc

cmd = [
'srun', '-p', f'{partition}', f'--gres=gpu:{gpus_per_node}',
f'--ntasks={gpus}', f'--ntasks-per-node={gpus_per_node}',
f'--cpus-per-task={cpus_per_task}', '--kill-on-bad-exit=1'
] + parsed_srun_args + [PYTHON, '-u', train_script, config
] + common_args

@ice-tong
Copy link
Collaborator

ice-tong commented Jan 4, 2023

The gridsearch command also need update~

if launcher == 'slurm':
msg = ('If launcher is slurm, '
'gpus-per-node and partition should not be None')
flag = (gpus_per_node is not None) and (partition is not None)
if not flag:
raise AssertionError(highlighted_error(msg))

@@ -382,12 +381,19 @@ def gridsearch(
if not has_job_name:
job_name = osp.splitext(osp.basename(config_path))[0]
parsed_srun_args.append(f'--job-name={job_name}_train')
cmd = [
if gpus:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if gpus:
if gpus_per_node:

f'--ntasks={gpus}', f'--ntasks-per-node={gpus_per_node}',
f'--cpus-per-task={cpus_per_task}', '--kill-on-bad-exit=1'
]
if gpus:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if gpus:
if gpus_per_node:

f'--cpus-per-task={cpus_per_task}', '--kill-on-bad-exit=1'
] + parsed_srun_args + [PYTHON, '-u', train_script, config
] + common_args
if gpus:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if gpus:
if gpus_per_node:

@ice-tong
Copy link
Collaborator

ice-tong commented Jan 9, 2023

Close due to the slurm can't launch without gpus, feel free to reopen if you have any question~

@ice-tong ice-tong closed this Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants