Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Polaris example config #1154

Merged
merged 1 commit into from
May 23, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions docs/configs/polaris.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from globus_compute_endpoint.executors import HighThroughputExecutor
from globus_compute_endpoint.strategies import SimpleStrategy
from parsl.addresses import address_by_interface
from parsl.launchers import SingleNodeLauncher
from parsl.launchers import MpiExecLauncher
from parsl.providers import PBSProProvider

# fmt: off
Expand All @@ -15,17 +15,21 @@
'scheduler_options': '#PBS -l filesystems=home:grand:eagle\n#PBS -k doe',
# ALCF allocation to use
'account': '',
# Un-comment to give each worker exclusive access to a single GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make GPU pinning opt-in?

I would vote for opt out for this option because it's rare for apps to fairly share GPUs amongst themselves (I don't know if any), so this default would lead to bad performance.

The balancing point to my suggestion is that restarting workers or reassigning them to different types is broken with accelerator pinning. (sorry, I don't recall the number of the issue I made about it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WardLT If we have a bug here, (is this the one -> #722) I think it would be better to leave this as an opt-in with a note pointing at the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That's the one.

# 'available_accelerators': 4,
}
}

config = Config(
executors=[
HighThroughputExecutor(
max_workers_per_node=1,
available_accelerators=user_opts['polaris'].get('available_accelerators'),
strategy=SimpleStrategy(max_idletime=300),
address=address_by_interface('bond0'),
provider=PBSProProvider(
launcher=SingleNodeLauncher(),
launcher=MpiExecLauncher(
bind_cmd="--cpu-bind", overrides="--depth=64 --ppn 1"
), # Ensures 1 manger per node, work on all 64 cores
account=user_opts['polaris']['account'],
queue='preemptable',
cpus_per_node=32,
Expand Down