-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI fails with "not enough slots" #57
Comments
Did you manage to get MPI running? |
Hi OxCoto, |
My work's been focused on a couple of other major things around openEMS that I'm hoping to announce in the coming months, arguably more important than performance, so I haven't had a chance to look too deep into MPI, but it's been on my mind. My goal would be to deploy MPI on AWS EFA, which offers tremendous speeds and is ideal for MPI applications. I just noticed their CFD example matches exactly what we're seeing with openEMS (although in our case, we unfortunately peak a lot quicker): It will likely take me quite some time before I start experimenting with MPI and seeing how to set things up with openEMS, but if we see a similar performance boost, that would be fantastic (especially for what I'm working on). So far, I've managed to build a robust multi-node architecture/RF orchestrator that utilizes distributed computing to speed up openEMS simulations (plural), though that's different from MPI. |
If anyone is wondering about the original MPI error. "Not enough slot" means MPI doesn't have information about which machines are available to execute an MPI program. A MPI program is not something you can just type and run. The system must be first prepared with a correctly configured MPI environment with a hostfile, then, the program should be launched via a suitable launcher like All of these have nothing to do with openEMS. One may want to follow the MPI Hello World tutorial to ensure your system or cluster is capable of running MPI applications first. Finally, it's worth noting that currently openEMS's MPI implementation is extremely suboptimal, it's basically a naive textbook implementation without none of standard communication-avoidance optimizations common in High Performance Computing. So it's only worthwhile for very large simulations, and in my opinion doesn't match the use case for most people. For the same reason, it's not a practical substitution to the existing multithread engine for single-machine use, because the parallelization overhead of MPI is much greater. I hope to eventually make a contribution to completely rewrite the MPI engine, but only after I finish the single-node optimizations first. |
I am attempting to use openEMS mpi and I'm getting the following errors:
##############
Running remote openEMS_MPI in working dir: /tmp/openEMS_MPI_OxYoCbMoLCNW
warning: implicit conversion from numeric to char
warning: called from
RunOpenEMS_MPI at line 90 column 15
RunOpenEMS at line 82 column 9
microstrip_mpi at line 174 column 1
Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 key--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots that were requested by the application:
/opt/openEMS/bin/openEMS
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:
Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
error: mpirun openEMS failed!
error: called from
RunOpenEMS_MPI at line 97 column 5
RunOpenEMS at line 82 column 9
microstrip_mpi at line 174 column 1
###########################
My source is in Octave (Matlab) format and is shown below:
##############
%
% microstrip transmission line, Z is normal to substrate Y is the direction of propagation and X is the width
% try mpi
close all
clear
clc
% mpi setup
Settings.MPI.Binary = '/opt/openEMS/bin/openEMS';
Settings.MPI.NrProc = 4;
Settings.MPI.Hosts = {'wolfpack'};
........
%% run openEMS
%RunOpenEMS( Sim_Path, Sim_CSX, '--numThreads=4',Settings );
options='';
RunOpenEMS( Sim_Path, Sim_CSX,options,Settings );
...........
####################
Please note that this is running on machine 'hydra' and the remote 2nd machine is 'wolfpack'
Both hydra and wolfpack are 28-core machines but the above code works only when Settings.MPI.NrProc = 1;
My machines are 28-core Xeon servers running Ubuntu 22.04 and the latest OpenEMS version
I was not able to find the answer after an extensive search.
I'm stumped as to what I'm missing here and it's probably obvious to those with more experience than me.
Thanks in advance!
Phil
The text was updated successfully, but these errors were encountered: