-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid MPI_Init for Serial I/O when built with MPI #1443
Comments
@ax3l a couple of things:
I don't know how a truly non-conflicting single MPI and Non-MPI library can be deployed. Ideas are welcome or, even better, we can borrow from an existing implementation doing that. Thanks! |
The idea I had was the following: maybe we can put a state in the core implementation classes such as |
@ax3l at the public interface of |
The way I have worked with MPI libraries that allow serial versions is to play with their MACRO |
Thanks, yes the namespace already helps a bit. For now, it's okayish to have to pull |
to narrow it down:
I am not aware of a MPI/serial library that can by-pass the above conflict using a single library. Unless two separate libraries and appropriate headers are provided. This is the piece I don't know enough about to tackle at the public interface layer. |
Yes, the absolute cleanest way would be a second header, say For practical cases, it's still okay if in the The "serial" functionality, e.g. the constructor, should be still available and can compile against the usual MPI library that ADIOS2 was build with. Yet internally, it switches at runtime to the Basically, just like This will enable several nice cases:
|
@ax3l I see the benefits, but the issue I am bringing up is conflicting definitions at the public layer.
without |
Yes, I understand what you mean. I think that would be harder to do since |
To put in my 2 cents, while I agree that it'd be nice to have adios2 work with and without MPI simultaneously, I think it's technically rather cumbersome to do. It's not a header-only library, so one can't just switch MPI implementations easily (e.g., sizeof(MPI_Comm) isn't even constant across different MPI implementations). So essentially, the compiled code has to be different -- and hence an One way I see that things could be done would be to compile a big library that essentially combines a serial and a parallel library into one big .so, using different namespaces (or different version trickery or something in the .so). And then one could have a header that depending on USE_MPI does something like One solution I can think of is an "install both" mode, which would compile both libadios2.so and libadios2_serial.so, and provides `adios2/adios2.h" and "adios2_serial.h", and one can then include one or the other and correspondingly link with one or the other. But I don't really like this, either... Thinking about this a bit more, though, I think HDF5 if compiled parallel let's you use it serially without trouble. So maybe there's a point in considering that. But it won't be straightforward to achieve this in adios2, I don't think. The idea would be to only use an MPI_Comm when it's actually required, not for the top level objects. |
Actually the initial point of this issue is just about your last paragraph for exactly these reasons. Keep the interface as is, including MPI headers if they were present, but always expose the "serial" constructor as well and avoid Of course, if we find a gentle way for a true split that would be great for compatibility as well but is imho not as urgent. |
@germasch thanks, yes that's pretty much how I see it. Would be helpful to point at an example on how HDF5 does it. My understanding is that when adios2 is compiled with MPI, the latter becomes a public dependency (any MPI-based library for that matter). @ax3l at this point, a proof-of-concept that passes the linking stage would be helpful. |
I can implement it. A simple proof-of-concept is very simple: diff --git a/bindings/CXX11/cxx11/ADIOS.cpp b/bindings/CXX11/cxx11/ADIOS.cpp
index 1d761440..0e93d76c 100644
--- a/bindings/CXX11/cxx11/ADIOS.cpp
+++ b/bindings/CXX11/cxx11/ADIOS.cpp
@@ -25,6 +25,8 @@ ADIOS::ADIOS(MPI_Comm comm, const bool debugMode) : ADIOS("", comm, debugMode)
{
}
+ADIOS::ADIOS(const bool debugMode) : ADIOS("", MPI_COMM_SELF, debugMode) {}
+
#else
ADIOS::ADIOS(const std::string &configFile, const bool debugMode)
: m_ADIOS(std::make_shared<core::ADIOS>(configFile, debugMode, "C++"))
diff --git a/bindings/CXX11/cxx11/ADIOS.h b/bindings/CXX11/cxx11/ADIOS.h
index dff4a599..7f3a8372 100644
--- a/bindings/CXX11/cxx11/ADIOS.h
+++ b/bindings/CXX11/cxx11/ADIOS.h
@@ -79,6 +79,8 @@ public:
*/
ADIOS(const std::string &configFile, const bool debugMode = true);
+#endif
+
/**
* Starting point for non-MPI apps. Creates an ADIOS object
* @param debugMode true: extra user-input debugging information, false: run
@@ -87,8 +89,7 @@ public:
* incorrect
*/
ADIOS(const bool debugMode = true);
-#endif
-
+
/** object inspection true: valid object, false: invalid object */
explicit operator bool() const noexcept; This will enable an application to use the serial constructor even when ADIOS2 is built with MPI. ADIOS2 will, of course, still pull in MPI, which is not much of a problem when it's provided as a shared library, because that'll happen automatically. However, it won't really work, because once the app is actually using such an ADIOS object, MPI functions will be called, and they'll complain because MPI has not been initialized. So to make this actually work, I see two options:
(The third option would be a more thorough reengineering of ADIOS2 to move down MPI calls down to only the layers where they're actually needed, which I think is too big of a change to be realistic). |
@germasch thanks! This basically confirms that it's not feasible that a single library (compiled with MPI) can support such dual behavior. MPI-based libraries wouldn't even link if MPI itself is not linked, unless we introduce conflicting MPI symbols (a hack again) which will lead to @ax3l issue list with mpidummy.h in adios1. BTW, ADIOS2 is already doing the third option. MPI calls are always done privately. The issue is narrowed down to linking the public API handshake with MPI (passing a comm) that any MPI-based library requires. The cleanest solution would be to keep the serial and parallel implementations completely separately, which most projects I am aware of keep track at compile time via a MACRO, as it's not feasible to do at run time once a library is compiled with MPI. To make things worse MPI distributions are further split between openmpi and mpich based distributions that are incompatible. @ax3l while I share the same wishlist, I believe this is not feasible with a single library. If you know a better (working) way please let us know. |
I don't think you understood me right. What the patch shows is the providing that API is easily doable. But that simple change doesn't actually make it usable without calling MPI_Init. What I said, though, is that it's possible to make that work quite easily, but it will require some (relatively minimal) wrapping of MPI inside of adios2. |
Apologies if I wasn't clear enough, by feasible I mean working: compiled/linked/running/tested, not only the code interface. My understanding from seeing the code, is that it won't link with serial code that doesn't provide a hack to MPI_Comm symbols. Nevertheless, work at runtime. |
To explain my primary idea with a small testable example: My goal would be to get serial tools like Here is a quick hack that starts this, but we should do this more systematic in all places: diff --git a/source/adios2/core/IO.cpp b/source/adios2/core/IO.cpp
index 238c8c93..e6a939a4 100644
--- a/source/adios2/core/IO.cpp
+++ b/source/adios2/core/IO.cpp
@@ -438,7 +438,18 @@ Engine &IO::Open(const std::string &name, const Mode mode,
}
MPI_Comm mpiComm;
- MPI_Comm_dup(mpiComm_orig, &mpiComm);
+ int flag;
+ MPI_Initialized(&flag);
+ if (flag)
+ {
+ MPI_Comm_dup(mpiComm_orig, &mpiComm);
+ // m_NeedMPICommFree = true;
+ }
+ else
+ {
+ mpiComm = mpiComm_orig;
+ // m_NeedMPICommFree = false;
+ }
std::shared_ptr<Engine> engine;
const bool isDefaultEngine = m_EngineType.empty() ? true : false;
std::string engineTypeLC = m_EngineType;
diff --git a/source/adios2/toolkit/format/bp3/BP3Base.cpp b/source/adios2/toolkit/format/bp3/BP3Base.cpp
index 3f484cb4..cf419d48 100644
--- a/source/adios2/toolkit/format/bp3/BP3Base.cpp
+++ b/source/adios2/toolkit/format/bp3/BP3Base.cpp
@@ -47,8 +47,14 @@ const std::map<int, std::string> BP3Base::m_TransformTypesToNames = {
BP3Base::BP3Base(MPI_Comm mpiComm, const bool debugMode)
: m_MPIComm(mpiComm), m_DebugMode(debugMode)
{
- MPI_Comm_rank(m_MPIComm, &m_RankMPI);
- MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+ int flag;
+ MPI_Initialized(&flag);
+ m_RankMPI = 0;
+ m_SizeMPI = 1;
+ if (flag) {
+ MPI_Comm_rank(m_MPIComm, &m_RankMPI);
+ MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+ }
m_Profiler.IsActive = true; // default
}
diff --git a/source/adios2/toolkit/transport/Transport.cpp b/source/adios2/toolkit/transport/Transport.cpp
index 83e8e8ca..ce8b5f4b 100644
--- a/source/adios2/toolkit/transport/Transport.cpp
+++ b/source/adios2/toolkit/transport/Transport.cpp
@@ -20,8 +20,14 @@ Transport::Transport(const std::string type, const std::string library,
MPI_Comm mpiComm, const bool debugMode)
: m_Type(type), m_Library(library), m_MPIComm(mpiComm), m_DebugMode(debugMode)
{
- MPI_Comm_rank(m_MPIComm, &m_RankMPI);
- MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+ int flag;
+ MPI_Initialized(&flag);
+ m_RankMPI = 0;
+ m_SizeMPI = 1;
+ if (flag) {
+ MPI_Comm_rank(m_MPIComm, &m_RankMPI);
+ MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+ }
}
void Transport::IWrite(const char *buffer, size_t size, Status &status,
diff --git a/source/utils/bpls/bpls.cpp b/source/utils/bpls/bpls.cpp
index c2054cdc..1914fcd0 100644
--- a/source/utils/bpls/bpls.cpp
+++ b/source/utils/bpls/bpls.cpp
@@ -2934,7 +2934,7 @@ char *mystrndup(const char *s, size_t n)
int main(int argc, char *argv[])
{
#ifdef ADIOS2_HAVE_MPI
- MPI_Init(&argc, &argv);
+// MPI_Init(&argc, &argv);
#endif
int retval = 1;
try
@@ -2947,7 +2947,7 @@ int main(int argc, char *argv[])
std::cout << e.what() << std::endl;
}
#ifdef ADIOS2_HAVE_MPI
- MPI_Finalize();
+// MPI_Finalize();
#endif
return retval;
} We can avoid the recurring |
@ax3l I understand how this is done privately...my concern is very simple: would it link with serial code? the MPI_Comm symbol gets resolved from MPI when adios2 is compiled against it, but the same MPI_Comm symbols will be missing when linking adios2 to serial code. Seems I am not getting the message somehow and showing code only won't guarantee this is possible at all. If you're correct and |
I understand that and it's still needed to link against MPI although it will then not be used in communication APIs (besides checking once for |
FWIW, |
After some discussion with @wfgodoy, we propose to solve what @ax3l
<https://github.com/ax3l> is asking for: write a complete abstract class
for communication and have to implementation of it, one with real MPI and
one with dummy MPI. The appropriate communicator object will be created in
the ADIOS constructor depending on if the ADIOS constructor is called with
a (real) MPI comm or without.
This will ensure that
- we have one library with both parallel and serial running capability.
- apps must be linked with MPI even though they never use it.
- serial apps linked with this library will run on login nodes that have no
MPI support
It removes the possibility to call ADIOS without any comm and Open with a
comm, but that was never a pretty option anyway. We are glad it will be
gone.
How do you feel about this?
…On Fri, May 24, 2019 at 10:15 AM Kai Germaschewski ***@***.***> wrote:
FWIW, MPI_Comm is not a symbol, it'd never be unresolved now matter what
you link to or not. But yes, there would be actual unresolved symbols, like
MPI_Comm_dup if not linking with MPI, even if those functions would never
actually end up being called at run time. But that's an issue for building
an executable, which is already resolved via the current mechanisms (e.g.,
cmake / adios2-config). @ax3l <https://github.com/ax3l>'s patch is
exactly what I mean, except, as he says, this should be done via wrappers
rather than throughout the code.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1443?email_source=notifications&email_token=AAYYYLOUCP2JVS3AIUGO4MTPW7Z6HA5CNFSM4HOUNIS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWFP3TY#issuecomment-495648207>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAYYYLL4PJCRBCAD25KGT5DPW7Z6HANCNFSM4HOUNISQ>
.
|
Sounds good to me, great! I do not fully understand the last paragraph, since we can still keep a serial version without a communicator in the public interface, imho. But probably I am just missing something that I haven't used (constructing without a comm and opening with a comm?). |
I think we have to narrow down some concepts we are mixing after talking to @pnorbert :
|
Demonstrator on Titan seems to confirm my mumbling: #include <mpi.h>
int main(int argc, char* argv[])
{
int flag;
MPI_Initialized(&flag); // ok to be called
if(argc > 1)
{
MPI_Init(&argc, &argv); // not okay :)
MPI_Finalize();
}
return 0;
} > pgc++ main.cpp -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpich
> ./a.out
> ./a.out 1
[Fri May 24 11:21:49 2019] [unknown] Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(249).......: channel initialization failed
MPID_Init(638).......: PMI2 init failed: 1
Aborted Yay! :) |
See PR #1466 for a proposed solution. |
@williamfgc @pnorbert can we close this now as fixed? |
@chuckatkins I'd say it's mostly fixed with some caveats, so I'd like to see #1480 merged (or at least discussed), too. |
As #1480 has been merged, as far as i'm concerned, this issue can be closed, @chuckatkins . |
Looks good to me as well, really happy this works! Thanks a lot as well to @germasch for contributing much of the implementation! :) |
Hi,
we discussed this offline before and we are currently at the point where we would need the feature again. When compiling ADIOS2 with MPI features, it would be wonderful if the serial constructors would still be exposed.
ADIOS2/bindings/CXX11/cxx11/ADIOS.h
Lines 67 to 80 in 02f246e
This would allow users to use a cluster module with MPI support and build small serial tools against it, e.g. for meta-data reads, without the need to call those with
mpirun
(and withoutMPI_Init
, ideally).Is it possible to expose the MPI functionality as a true superset of the serial functionality? Also from a package management point of view, this would make a lot of sense for compatibility.
Regarding the current implementation in
source/adios2/ADIOSMPI.h
, I am a bit afraid we run into usability issues (also seen in practical examples with ADIOS1: ornladios/ADIOS#183). Instead of mocking the MPI APIs, could we potentially just use those as a wrapper and an internal state (MPI-parallel or serial) to switch the implementation at runtime?The text was updated successfully, but these errors were encountered: