-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plots show excessive amounts of resources #187
Comments
Hi Hub, when running that config file, I see the following resource description being used in this line:
so that seems to indicate that indeed 1k cores are being allocated. So unfortunately the plotting is correct, the resource allocation is faulty. |
Thanks Andre,
I will have to go and track that down. There are some other weird things going on in that department anyway.
Best wishes,
Huub
…-----------------------------------------------------------------------------------------------------
Hubertus van Dam, 631-344-6020, ***@***.******@***.***>
Brookhaven National Laboratory
From: Andre Merzky ***@***.***>
Date: Friday, December 22, 2023 at 8:35 AM
To: radical-cybertools/radical.analytics ***@***.***>
Cc: Van Dam, Hubertus ***@***.***>, Author ***@***.***>
Subject: Re: [radical-cybertools/radical.analytics] Plots show excessive amounts of resources (Issue #187)
Hi Hub,
when running that config file, I see the following resource description being used in this line<https://github.com/hjjvandam/DeepDriveMD-pipeline/blob/feature/nwchem/deepdrivemd/deepdrivemd.py#L275>:
{'access_schema': 'local',
'cpus': 1024,
'gpus': 64,
'project': 'CHM136_crusher',
'queue': 'batch',
'resource': 'ornl.crusher',
'walltime': 180}
so that seems to indicate that indeed 1k cores are being allocated. So unfortunately the plotting is correct, the resource allocation is faulty.
—
Reply to this email directly, view it on GitHub<#187 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABDS7HTSRTLQGJ6ZONPYAILYKWEARAVCNFSM6AAAAABA7BF6D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGY4TMMBZGA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am running some workflows on Crusher. The stage with the largest number of tasks runs 64 of them, each using 1 CPU core. The performance analysis plots suggest, however, that around 1000 cores were reserved for this workflow. With 64 CPU cores and 4 GPUs per node you only get this if the node allocation would correspond to 1 GPU per task. I.e. reserving 16 nodes for 64 single core tasks. I hope that the code isn't actually doing that and that just the plotting is off.
The performance data is stored at
I have copied the performance plots into the same directory.
The versions of the RADICAL Cybertools packages are:
The code I am running lives at
In branch
feature/nwchem
. The job I am running is specified in https://github.com/hjjvandam/DeepDriveMD-pipeline/blob/feature/nwchem/test/bba/molecular_dynamics_workflow_nwchem_test/config.yaml. Let me know if you need any further information, please.The text was updated successfully, but these errors were encountered: