Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running modeling.py in a cluster environment #2

Open
heejongkim opened this issue Jan 2, 2021 · 5 comments
Open

running modeling.py in a cluster environment #2

heejongkim opened this issue Jan 2, 2021 · 5 comments

Comments

@heejongkim
Copy link

Hi,

I would like to perform the computational expensive modeling.py across multiple nodes.
However, it seems like modeling.py with its associated script is made for a single machine.
Do you have any examples or recommendations how to accomplish that?
I assume that i need to use mpi but I'm not sure how to properly modify to maximize the speed, efficiency, and replica exchange across nodes.

Thanks.

best,
hee jong

@saltzberg
Copy link

Hi Hee Jong,

To run parallel-processing replica exchange, IMP must be compiled using MPI. For example, mpicxx. To install IMP, use the CMAKE flag -DCMAKE_CXX_COMPILER=/usr/local/bin/mpicxx. You can read more about using the CMAKE flags for installing IMP here.

One can then use mpirun to initiate a parallel job, e.g.:

mpirun -np 4 python modeling.py

which will perform a single modeling run with four replicas.

Running multiple modeling runs on a cluster requires setting up a script specific for that cluster software and architecture. Once you have successfully been able to run a single parallel replica exchange simulation using the command above, you should be able to use that line in your cluster submission script.

@benmwebb
Copy link
Member

benmwebb commented Jan 3, 2021

To install IMP, use the CMAKE flag -DCMAKE_CXX_COMPILER=/usr/local/bin/mpicxx

This isn't a great idea because it will result in all of IMP being compiled with MPI. Only the IMP::mpi module needs to be compiled with MPI. As long as mpicxx and friends are in your PATH CMake should do the right thing. Most of the prebuilt IMP binaries (e.g. homebrew, Anaconda, RPM) are built with MPI support.

@heejongkim
Copy link
Author

Thanks for both.

@saltzberg I already compiled the imp with the cluster's mpicxx and all that and put it in the module.
What I actually got confused about is rnapolii/modeling/run_rnapolii_modeling.sh looping through N and n_steps.
And this repo's modeling.py takes those info + output path as arguments so I wanted to make sure how to properly edit those to meet mpirun "expectations".

For example, from previous RNA pol II tutorial, I made the following SLRUM script to submit the modeling job.

#!/usr/bin/bash
#SBATCH --partition=defq
#SBATCH --output=logfiles/%j.out
#SBATCH --error=logfiles/%j.err
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load imp/2.13.0 ## this will automatically load as well as unload dependencies and conflicts
mpirun --map-by node python modeling.py ## instead of using -np, used --map-by coupled with --ntasks-per-node to specify the number of threads per node

it would be awesome if you can help me convert the for loop in bash to mpirun command.
Thank you for your guidance.

@benmwebb Would it cause any serious issues if I set CXX_COMPILER to mpicxx? Due to the cluster environment complexity, I preferred to be explicit so I set that up and compiled. If i don't, sometimes it's a little bit difficult to keep track of which compiler/libraries were used for this specific instance.
Thank you for your insight.

@heejongkim
Copy link
Author

So, I just changed

global_output_directory="output" in ReplicaExchange0

and

num_frames to fixed value instead of sys.argv to take number from command line

and utilized my SLURM script to submit with mpirun

I've been watching the log and queue for an hour and it seems emitting expected outputs and not failing.
If you have any other suggestions to improve, please let me know.

Thanks!

@heejongkim
Copy link
Author

Ah... seems like it's hitting something with mpirun.

Same data, topology and almost identical modeling.py (only change is sys.argv portion)
Even single node run, mpirun spit the following error while run_rnapolii_modeling.sh is fine to start the iteration.

Traceback (most recent call last):
File "init.py", line 141, in
max_srb_rot=0.3)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/macros.py", line 723, in execute_macro
self.root_hier = self.system.build()
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 155, in build
state.build(**kwargs)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 260, in build
mol.build(**kwargs)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 747, in build
self, rep, self.coord_finder, rephandler)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/system_tools.py", line 275, in build_representation
model)
File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/isd/gmm_tools.py", line 40, in decorate_gmm_from_text
weight=float(fields[2])
IndexError: list index out of range

Any suggestions and/or insights are very much appreciated.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants