Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI fails with "not enough slots" #57

Open
montanaviking opened this issue Apr 24, 2023 · 4 comments
Open

MPI fails with "not enough slots" #57

montanaviking opened this issue Apr 24, 2023 · 4 comments

Comments

@montanaviking
Copy link

I am attempting to use openEMS mpi and I'm getting the following errors:
##############
Running remote openEMS_MPI in working dir: /tmp/openEMS_MPI_OxYoCbMoLCNW
warning: implicit conversion from numeric to char
warning: called from
RunOpenEMS_MPI at line 90 column 15
RunOpenEMS at line 82 column 9
microstrip_mpi at line 174 column 1

Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 key--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots that were requested by the application:

/opt/openEMS/bin/openEMS

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:

Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
error: mpirun openEMS failed!
error: called from
RunOpenEMS_MPI at line 97 column 5
RunOpenEMS at line 82 column 9
microstrip_mpi at line 174 column 1

###########################

My source is in Octave (Matlab) format and is shown below:

##############
%
% microstrip transmission line, Z is normal to substrate Y is the direction of propagation and X is the width
% try mpi

close all
clear
clc

% mpi setup

Settings.MPI.Binary = '/opt/openEMS/bin/openEMS';
Settings.MPI.NrProc = 4;
Settings.MPI.Hosts = {'wolfpack'};

........

%% run openEMS
%RunOpenEMS( Sim_Path, Sim_CSX, '--numThreads=4',Settings );
options='';
RunOpenEMS( Sim_Path, Sim_CSX,options,Settings );

...........

####################

Please note that this is running on machine 'hydra' and the remote 2nd machine is 'wolfpack'
Both hydra and wolfpack are 28-core machines but the above code works only when Settings.MPI.NrProc = 1;
My machines are 28-core Xeon servers running Ubuntu 22.04 and the latest OpenEMS version

I was not able to find the answer after an extensive search.
I'm stumped as to what I'm missing here and it's probably obvious to those with more experience than me.
Thanks in advance!
Phil

@0xCoto
Copy link

0xCoto commented Sep 8, 2023

Did you manage to get MPI running?

@montanaviking
Copy link
Author

Hi OxCoto,
I am very interested in getting MPI working on OpenEMS. OpenEMS speed is mainly limited by RAM speed. I have four servers, each with two Xeons. Two of those have 28cores and one has 44cores. Yea, all those cores are great for solving problems such as circuit optimization, but OpenEMS maxes out with just two or three cores/socket. I'm thinking that using MPI to spread the work over three or four machines would significantly improve the overall throughput. I have 100G Ethernet connections between three of the machines.
Unfortunately I haven't gotten MPI to work on OpenEMS yet and haven't had time recently but I'm still very much interested it solving it and will look into this again soon.
Did you have the same problem as me?
Thanks,
Phil

@0xCoto
Copy link

0xCoto commented Sep 9, 2023

My work's been focused on a couple of other major things around openEMS that I'm hoping to announce in the coming months, arguably more important than performance, so I haven't had a chance to look too deep into MPI, but it's been on my mind. My goal would be to deploy MPI on AWS EFA, which offers tremendous speeds and is ideal for MPI applications.

I just noticed their CFD example matches exactly what we're seeing with openEMS (although in our case, we unfortunately peak a lot quicker):

image

It will likely take me quite some time before I start experimenting with MPI and seeing how to set things up with openEMS, but if we see a similar performance boost, that would be fantastic (especially for what I'm working on).

So far, I've managed to build a robust multi-node architecture/RF orchestrator that utilizes distributed computing to speed up openEMS simulations (plural), though that's different from MPI.

@biergaizi
Copy link
Contributor

biergaizi commented May 29, 2024

If anyone is wondering about the original MPI error. "Not enough slot" means MPI doesn't have information about which machines are available to execute an MPI program.

A MPI program is not something you can just type and run. The system must be first prepared with a correctly configured MPI environment with a hostfile, then, the program should be launched via a suitable launcher like mpirun (or likwid-mpirun) or a Resource Manager like Slurm for your HPC cluster, they in turn, pass the information about systems and clusters to the MPI program.

All of these have nothing to do with openEMS. One may want to follow the MPI Hello World tutorial to ensure your system or cluster is capable of running MPI applications first.

Finally, it's worth noting that currently openEMS's MPI implementation is extremely suboptimal, it's basically a naive textbook implementation without none of standard communication-avoidance optimizations common in High Performance Computing. So it's only worthwhile for very large simulations, and in my opinion doesn't match the use case for most people. For the same reason, it's not a practical substitution to the existing multithread engine for single-machine use, because the parallelization overhead of MPI is much greater.

I hope to eventually make a contribution to completely rewrite the MPI engine, but only after I finish the single-node optimizations first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants