Plots show excessive amounts of resources #187

hjjvandam · 2023-12-21T22:33:42Z

I am running some workflows on Crusher. The stage with the largest number of tasks runs 64 of them, each using 1 CPU core. The performance analysis plots suggest, however, that around 1000 cores were reserved for this workflow. With 64 CPU cores and 4 GPUs per node you only get this if the node allocation would correspond to 1 GPU per task. I.e. reserving 16 nodes for 64 single core tasks. I hope that the code isn't actually doing that and that just the plotting is off.

The performance data is stored at

/lustre/orion/world-shared/chm136/re.session.login2.hjjvd.019706.0000

I have copied the performance plots into the same directory.

The versions of the RADICAL Cybertools packages are:

(pydeepdrivemd) [[email protected] test]$ pip list | grep radical
radical.analytics            1.43.0
radical.entk                 1.43.0
radical.gtod                 1.43.0
radical.pilot                1.43.0
radical.saga                 1.43.0
radical.utils                1.44.0

The code I am running lives at

[email protected]:hjjvandam/DeepDriveMD-pipeline.git

In branch feature/nwchem. The job I am running is specified in https://github.com/hjjvandam/DeepDriveMD-pipeline/blob/feature/nwchem/test/bba/molecular_dynamics_workflow_nwchem_test/config.yaml. Let me know if you need any further information, please.

The text was updated successfully, but these errors were encountered:

andre-merzky · 2023-12-22T13:34:54Z

Hi Hub,

when running that config file, I see the following resource description being used in this line:

{'access_schema': 'local',
 'cpus': 1024,
 'gpus': 64,
 'project': 'CHM136_crusher',
 'queue': 'batch',
 'resource': 'ornl.crusher',
 'walltime': 180}

so that seems to indicate that indeed 1k cores are being allocated. So unfortunately the plotting is correct, the resource allocation is faulty.

hjjvandam · 2023-12-22T13:49:24Z

Thanks Andre, I will have to go and track that down. There are some other weird things going on in that department anyway. Best wishes, Huub

…

----------------------------------------------------------------------------------------------------- Hubertus van Dam, 631-344-6020, ***@***.******@***.***> Brookhaven National Laboratory From: Andre Merzky ***@***.***> Date: Friday, December 22, 2023 at 8:35 AM To: radical-cybertools/radical.analytics ***@***.***> Cc: Van Dam, Hubertus ***@***.***>, Author ***@***.***> Subject: Re: [radical-cybertools/radical.analytics] Plots show excessive amounts of resources (Issue #187) Hi Hub, when running that config file, I see the following resource description being used in this line<https://github.com/hjjvandam/DeepDriveMD-pipeline/blob/feature/nwchem/deepdrivemd/deepdrivemd.py#L275>: {'access_schema': 'local', 'cpus': 1024, 'gpus': 64, 'project': 'CHM136_crusher', 'queue': 'batch', 'resource': 'ornl.crusher', 'walltime': 180} so that seems to indicate that indeed 1k cores are being allocated. So unfortunately the plotting is correct, the resource allocation is faulty. — Reply to this email directly, view it on GitHub<#187 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABDS7HTSRTLQGJ6ZONPYAILYKWEARAVCNFSM6AAAAABA7BF6D2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGY4TMMBZGA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plots show excessive amounts of resources #187

Plots show excessive amounts of resources #187

hjjvandam commented Dec 21, 2023

andre-merzky commented Dec 22, 2023

hjjvandam commented Dec 22, 2023 via email

Plots show excessive amounts of resources #187

Plots show excessive amounts of resources #187

Comments

hjjvandam commented Dec 21, 2023

andre-merzky commented Dec 22, 2023

hjjvandam commented Dec 22, 2023 via email