running modeling.py in a cluster environment #2

heejongkim · 2021-01-02T08:15:01Z

Hi,

I would like to perform the computational expensive modeling.py across multiple nodes.
However, it seems like modeling.py with its associated script is made for a single machine.
Do you have any examples or recommendations how to accomplish that?
I assume that i need to use mpi but I'm not sure how to properly modify to maximize the speed, efficiency, and replica exchange across nodes.

Thanks.

best,
hee jong

saltzberg · 2021-01-02T21:37:22Z

Hi Hee Jong,

To run parallel-processing replica exchange, IMP must be compiled using MPI. For example, mpicxx. To install IMP, use the CMAKE flag -DCMAKE_CXX_COMPILER=/usr/local/bin/mpicxx. You can read more about using the CMAKE flags for installing IMP here.

One can then use mpirun to initiate a parallel job, e.g.:

mpirun -np 4 python modeling.py

which will perform a single modeling run with four replicas.

Running multiple modeling runs on a cluster requires setting up a script specific for that cluster software and architecture. Once you have successfully been able to run a single parallel replica exchange simulation using the command above, you should be able to use that line in your cluster submission script.

benmwebb · 2021-01-03T02:09:12Z

To install IMP, use the CMAKE flag -DCMAKE_CXX_COMPILER=/usr/local/bin/mpicxx

This isn't a great idea because it will result in all of IMP being compiled with MPI. Only the IMP::mpi module needs to be compiled with MPI. As long as mpicxx and friends are in your PATH CMake should do the right thing. Most of the prebuilt IMP binaries (e.g. homebrew, Anaconda, RPM) are built with MPI support.

heejongkim · 2021-01-03T04:13:10Z

Thanks for both.

@saltzberg I already compiled the imp with the cluster's mpicxx and all that and put it in the module.
What I actually got confused about is rnapolii/modeling/run_rnapolii_modeling.sh looping through N and n_steps.
And this repo's modeling.py takes those info + output path as arguments so I wanted to make sure how to properly edit those to meet mpirun "expectations".

For example, from previous RNA pol II tutorial, I made the following SLRUM script to submit the modeling job.

#!/usr/bin/bash
#SBATCH --partition=defq
#SBATCH --output=logfiles/%j.out
#SBATCH --error=logfiles/%j.err
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load imp/2.13.0 ## this will automatically load as well as unload dependencies and conflicts
mpirun --map-by node python modeling.py ## instead of using -np, used --map-by coupled with --ntasks-per-node to specify the number of threads per node

it would be awesome if you can help me convert the for loop in bash to mpirun command.
Thank you for your guidance.

@benmwebb Would it cause any serious issues if I set CXX_COMPILER to mpicxx? Due to the cluster environment complexity, I preferred to be explicit so I set that up and compiled. If i don't, sometimes it's a little bit difficult to keep track of which compiler/libraries were used for this specific instance.
Thank you for your insight.

heejongkim · 2021-01-03T07:05:49Z

So, I just changed

global_output_directory="output" in ReplicaExchange0

and

num_frames to fixed value instead of sys.argv to take number from command line

and utilized my SLURM script to submit with mpirun

I've been watching the log and queue for an hour and it seems emitting expected outputs and not failing.
If you have any other suggestions to improve, please let me know.

Thanks!

heejongkim · 2021-01-03T21:36:01Z

Ah... seems like it's hitting something with mpirun.

Same data, topology and almost identical modeling.py (only change is sys.argv portion)
Even single node run, mpirun spit the following error while run_rnapolii_modeling.sh is fine to start the iteration.

Traceback (most recent call last):
File "init.py", line 141, in
max_srb_rot=0.3)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/macros.py", line 723, in execute_macro
self.root_hier = self.system.build()
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 155, in build
state.build(**kwargs)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 260, in build
mol.build(**kwargs)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 747, in build
self, rep, self.coord_finder, rephandler)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/system_tools.py", line 275, in build_representation
model)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/isd/gmm_tools.py", line 40, in decorate_gmm_from_text
weight=float(fields[2])
IndexError: list index out of range

Any suggestions and/or insights are very much appreciated.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running modeling.py in a cluster environment #2

running modeling.py in a cluster environment #2

heejongkim commented Jan 2, 2021

saltzberg commented Jan 2, 2021

benmwebb commented Jan 3, 2021

heejongkim commented Jan 3, 2021

heejongkim commented Jan 3, 2021

heejongkim commented Jan 3, 2021

running modeling.py in a cluster environment #2

running modeling.py in a cluster environment #2

Comments

heejongkim commented Jan 2, 2021

saltzberg commented Jan 2, 2021

benmwebb commented Jan 3, 2021

heejongkim commented Jan 3, 2021

heejongkim commented Jan 3, 2021

heejongkim commented Jan 3, 2021