Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid MPI_Init for Serial I/O when built with MPI #1443

Closed
ax3l opened this issue May 22, 2019 · 30 comments
Closed

Avoid MPI_Init for Serial I/O when built with MPI #1443

ax3l opened this issue May 22, 2019 · 30 comments

Comments

@ax3l
Copy link
Contributor

ax3l commented May 22, 2019

Hi,

we discussed this offline before and we are currently at the point where we would need the feature again. When compiling ADIOS2 with MPI features, it would be wonderful if the serial constructors would still be exposed.

ADIOS(const std::string &configFile = "", MPI_Comm comm = MPI_COMM_SELF,
const bool debugMode = true);
#else
/**
* Starting point for non-MPI serial apps. Creates an ADIOS object allowing
* a runtime config file.
* @param configFile runtime config file
* @param debugMode true: extra user-input debugging information, false: run
* without checking user-input (stable workflows)
* @exception std::invalid_argument in debugMode = true if user input is
* incorrect
*/
ADIOS(const std::string &configFile, const bool debugMode = true);

This would allow users to use a cluster module with MPI support and build small serial tools against it, e.g. for meta-data reads, without the need to call those with mpirun (and without MPI_Init, ideally).

Is it possible to expose the MPI functionality as a true superset of the serial functionality? Also from a package management point of view, this would make a lot of sense for compatibility.

Regarding the current implementation in source/adios2/ADIOSMPI.h, I am a bit afraid we run into usability issues (also seen in practical examples with ADIOS1: ornladios/ADIOS#183). Instead of mocking the MPI APIs, could we potentially just use those as a wrapper and an internal state (MPI-parallel or serial) to switch the implementation at runtime?

@williamfgc
Copy link
Contributor

@ax3l a couple of things:

  1. in ADIOS2 the way we currently write portable code is using the macro #ifdef ADIOS2_HAVE_MPI
#ifdef ADIOS2_HAVE_MPI
    adios2::ADIOS adios(comm);
#else 
    adios2::ADIOS adios;
#endif
  1. source/adios2/ADIOSMPI.h is completely private, never exposed to the application. In ADIOS1.x mpidummy.h was exposed while in ADIOS2 is private and inside the namespace adios2

I don't know how a truly non-conflicting single MPI and Non-MPI library can be deployed. Ideas are welcome or, even better, we can borrow from an existing implementation doing that. Thanks!

@ax3l
Copy link
Contributor Author

ax3l commented May 22, 2019

The idea I had was the following: maybe we can put a state in the core implementation classes such as core::ADIOS, Engine::Engine, Transport::Transport, etc. that stores if MPI_Initialized is false and the communicator MPI_COMM_SELF was passed. Then, instead of calling MPI_Comm_rank/size and the other MPI functions seen in source/adios2/ADIOSMPI.h directly we can add a small wrapper that selects the MPI implementation from the dummy or the real ones based on this state (at runtime rather than compile time)? That would allow to avoid calling MPI_Init/MPI_Finalize, e.g. in bpls and would likewise help our libs downstream.

@williamfgc
Copy link
Contributor

williamfgc commented May 23, 2019

@ax3l at the public interface of adios2::ADIOS the MPI_COMM_SELF definition must come from the user. I understand what you mean for the above internal objects. Not sure if you can pass MPI_COMM_SELF or any MPI defintion without inclusion of mpi.h unless a new definition is passed which might conflict with actual MPI definitions if somewhere in the pipeline when linking. That was the adios1.x issue when mixing MPI code with the non-MPI library, thus we decided to completely separate interfaces at compile time and keep mpidummy.h inside namespace adios2. It would be helpful if we can have a simple example as I don't have a clear understanding how this would work at the public interface level. Thanks!

@williamfgc
Copy link
Contributor

The way I have worked with MPI libraries that allow serial versions is to play with their MACRO XXX_USE_MPI or XXX_HAVE_MPI I am currently doing that with MFEM and VTK. I am personally open to more convenient ways.

@ax3l
Copy link
Contributor Author

ax3l commented May 23, 2019

Thanks, yes the namespace already helps a bit.
The workflow that is currently not working is the following: I would like to call serial ADIOS functionality from an ADIOS library that is build with MPI support without having to call MPI_Init first. Basically what you already work-around in bpls is what affects us as well. That work-around is fine for a final app but not so fine for me as a middleware library, initializing (and finalizing) MPI for serial I/O without the user expecting it.

For now, it's okayish to have to pull <mpi.h> since it was available at compile-time as well and CMake configs can cover that (or macros as you said, but this will currently change the public API and ABI for ADIOS:ADIOS making it binary incompatible). Of course, two separate headers could make this even more clear but that's just a nice-to-have. But avoiding MPI_Init for serial functionality would be great. ADIOS::ADIOS already does this well but we should make it consequently at all following calls to MPI, too.

@williamfgc
Copy link
Contributor

williamfgc commented May 23, 2019

to narrow it down:

#include <adios2.h>

int main()
{
// here MPI_Comm in ADIOS.h or ADIOS.cpp (if forward declared in the ADIOS class) 
// must come from somewhere at the linking stage....
// without MPI_Init (or mpi.h) any non-MPI definition of MPI_Comm 
// will conflict if MPI is linked
adios2::ADIOS adios;
}

I am not aware of a MPI/serial library that can by-pass the above conflict using a single library. Unless two separate libraries and appropriate headers are provided. This is the piece I don't know enough about to tackle at the public interface layer.

@ax3l
Copy link
Contributor Author

ax3l commented May 23, 2019

Yes, the absolute cleanest way would be a second header, say <adios2-serial.h> and a separate library artifact with no MPI-related functionality.

For practical cases, it's still okay if in the USE_MPI case the <mpi.h> header is still transitively pulled and the MPI lib still needs to be linked, e.g. since it's available anyway from a HPC system's module and will be covered by adios-config/ADIOS2Config.cmake without the user noticing much.

The "serial" functionality, e.g. the constructor, should be still available and can compile against the usual MPI library that ADIOS2 was build with. Yet internally, it switches at runtime to the adios::helper::mpi functionality if MPI_COMM_SELF is passed (from the internal constructor) and MPI_Finalize is found to be false as well.

Basically, just like bpsl is written but without the #ifdef ADIOS2_HAVE_MPI branches.

This will enable several nice cases:

  • serial tools can be build with the same module combination on a HPC system as a main application
  • mpirun is avoided where this is allowed and possible
  • Debian/Suse/... binary packages such as adios2-serial adios2-openmpi adios2-mpich are not fully incompatible as providers for serial tools, instead, a serial tool can depend on any of those
  • when linking a dependency that pulls serial functionality of ADIOS2 with a dependency that pulls mpi-enabled functionality of ADIOS2, those ADIOS2 libs and symbols are not incompatible and symbol hiding in wrapper libraries can be avoided for end users

@williamfgc
Copy link
Contributor

@ax3l I see the benefits, but the issue I am bringing up is conflicting definitions at the public layer.
Let me clarify, the ADIOS class has a member MPI_Comm which must be defined when linking (when adios2 was compiled with mpi, it came from mpi.h):

class  ADIOS
{
public:
MPI_Comm m_Comm;

}

without mpi.h users will need to provide a conflicting definition which work work. Can MPI_Comm be defined without a MPI_Init? That's the piece that I am missing.

@ax3l
Copy link
Contributor Author

ax3l commented May 23, 2019

Yes, I understand what you mean. I think that would be harder to do since MPI_Comm and its memory layout is not defined in the MPI standard and thus the ADIOS public ABI of the class cannot be fully defined. Maybe passing a not typesafe void* would be an option if one wants to avoid two ADIOS classes and two headers.
(There is an effort to at least make MPICH flavors ABI compatible.)

@germasch
Copy link
Contributor

To put in my 2 cents, while I agree that it'd be nice to have adios2 work with and without MPI simultaneously, I think it's technically rather cumbersome to do.

It's not a header-only library, so one can't just switch MPI implementations easily (e.g., sizeof(MPI_Comm) isn't even constant across different MPI implementations). So essentially, the compiled code has to be different -- and hence an ADIOS object created from the serial constructor will be different from an ADIOS object from a constructor that takes an MPI communicator. That's not really possible, unless you make them different classes by putting them into different namespaces.

One way I see that things could be done would be to compile a big library that essentially combines a serial and a parallel library into one big .so, using different namespaces (or different version trickery or something in the .so). And then one could have a header that depending on USE_MPI does something like using ADIOS = serial::ADIOS or using ADIOS = parallel::ADIOS. That seems quite hacky to me, and not easy to implement, and that library will still pull in the MPI library, anyway, if though it may not use it.

One solution I can think of is an "install both" mode, which would compile both libadios2.so and libadios2_serial.so, and provides `adios2/adios2.h" and "adios2_serial.h", and one can then include one or the other and correspondingly link with one or the other. But I don't really like this, either...

Thinking about this a bit more, though, I think HDF5 if compiled parallel let's you use it serially without trouble. So maybe there's a point in considering that. But it won't be straightforward to achieve this in adios2, I don't think. The idea would be to only use an MPI_Comm when it's actually required, not for the top level objects.

@ax3l
Copy link
Contributor Author

ax3l commented May 23, 2019

Actually the initial point of this issue is just about your last paragraph for exactly these reasons. Keep the interface as is, including MPI headers if they were present, but always expose the "serial" constructor as well and avoid MPI_ calls if MPI is found uninitialized, so we do not need to artificially initialize MPI. That way, we would still not get a compatible ABI but can at least use the same modules in a working environment instead of having a serial and a parallely installed ADIOS.

Of course, if we find a gentle way for a true split that would be great for compatibility as well but is imho not as urgent.

@williamfgc
Copy link
Contributor

williamfgc commented May 23, 2019

@germasch thanks, yes that's pretty much how I see it. Would be helpful to point at an example on how HDF5 does it. My understanding is that when adios2 is compiled with MPI, the latter becomes a public dependency (any MPI-based library for that matter). @ax3l at this point, a proof-of-concept that passes the linking stage would be helpful.

@germasch
Copy link
Contributor

I can implement it. A simple proof-of-concept is very simple:

diff --git a/bindings/CXX11/cxx11/ADIOS.cpp b/bindings/CXX11/cxx11/ADIOS.cpp
index 1d761440..0e93d76c 100644
--- a/bindings/CXX11/cxx11/ADIOS.cpp
+++ b/bindings/CXX11/cxx11/ADIOS.cpp
@@ -25,6 +25,8 @@ ADIOS::ADIOS(MPI_Comm comm, const bool debugMode) : ADIOS("", comm, debugMode)
 {
 }

+ADIOS::ADIOS(const bool debugMode) : ADIOS("", MPI_COMM_SELF, debugMode) {}
+
 #else
 ADIOS::ADIOS(const std::string &configFile, const bool debugMode)
 : m_ADIOS(std::make_shared<core::ADIOS>(configFile, debugMode, "C++"))
diff --git a/bindings/CXX11/cxx11/ADIOS.h b/bindings/CXX11/cxx11/ADIOS.h
index dff4a599..7f3a8372 100644
--- a/bindings/CXX11/cxx11/ADIOS.h
+++ b/bindings/CXX11/cxx11/ADIOS.h
@@ -79,6 +79,8 @@ public:
      */
     ADIOS(const std::string &configFile, const bool debugMode = true);

+#endif
+
     /**
      * Starting point for non-MPI apps. Creates an ADIOS object
      * @param debugMode true: extra user-input debugging information, false: run
@@ -87,8 +89,7 @@ public:
      * incorrect
      */
     ADIOS(const bool debugMode = true);
-#endif
-
+
     /** object inspection true: valid object, false: invalid object */
     explicit operator bool() const noexcept;

This will enable an application to use the serial constructor even when ADIOS2 is built with MPI. ADIOS2 will, of course, still pull in MPI, which is not much of a problem when it's provided as a shared library, because that'll happen automatically.

However, it won't really work, because once the app is actually using such an ADIOS object, MPI functions will be called, and they'll complain because MPI has not been initialized.

So to make this actually work, I see two options:

  • have ADIOS2 itself call MPI_Init in such a case (I think that's just too hacky, though).
  • in ADIOS2, wrap a bunch of MPI functions always, ie, MPI_Comm_dup, MPI_Comm_rank/size, MPI_Bcast, maybe more, so that they'll not call down into MPI when it's not initialized.

(The third option would be a more thorough reengineering of ADIOS2 to move down MPI calls down to only the layers where they're actually needed, which I think is too big of a change to be realistic).

@williamfgc
Copy link
Contributor

@germasch thanks! This basically confirms that it's not feasible that a single library (compiled with MPI) can support such dual behavior. MPI-based libraries wouldn't even link if MPI itself is not linked, unless we introduce conflicting MPI symbols (a hack again) which will lead to @ax3l issue list with mpidummy.h in adios1.

BTW, ADIOS2 is already doing the third option. MPI calls are always done privately. The issue is narrowed down to linking the public API handshake with MPI (passing a comm) that any MPI-based library requires. The cleanest solution would be to keep the serial and parallel implementations completely separately, which most projects I am aware of keep track at compile time via a MACRO, as it's not feasible to do at run time once a library is compiled with MPI. To make things worse MPI distributions are further split between openmpi and mpich based distributions that are incompatible.

@ax3l while I share the same wishlist, I believe this is not feasible with a single library. If you know a better (working) way please let us know.

@germasch
Copy link
Contributor

I don't think you understood me right. What the patch shows is the providing that API is easily doable. But that simple change doesn't actually make it usable without calling MPI_Init. What I said, though, is that it's possible to make that work quite easily, but it will require some (relatively minimal) wrapping of MPI inside of adios2.

@williamfgc
Copy link
Contributor

williamfgc commented May 24, 2019

Apologies if I wasn't clear enough, by feasible I mean working: compiled/linked/running/tested, not only the code interface. My understanding from seeing the code, is that it won't link with serial code that doesn't provide a hack to MPI_Comm symbols. Nevertheless, work at runtime.

@ax3l
Copy link
Contributor Author

ax3l commented May 24, 2019

To explain my primary idea with a small testable example: My goal would be to get serial tools like bpls and bpdump running in the sense that it does not require the MPI_Init/MPI_Finalize anymore if ADIOS2 is build with MPI.

Here is a quick hack that starts this, but we should do this more systematic in all places:

diff --git a/source/adios2/core/IO.cpp b/source/adios2/core/IO.cpp
index 238c8c93..e6a939a4 100644
--- a/source/adios2/core/IO.cpp
+++ b/source/adios2/core/IO.cpp
@@ -438,7 +438,18 @@ Engine &IO::Open(const std::string &name, const Mode mode,
     }
 
     MPI_Comm mpiComm;
-    MPI_Comm_dup(mpiComm_orig, &mpiComm);
+    int flag;
+    MPI_Initialized(&flag);
+    if (flag)
+    {
+        MPI_Comm_dup(mpiComm_orig, &mpiComm);
+        // m_NeedMPICommFree = true;
+    }
+    else
+    {
+        mpiComm = mpiComm_orig;
+        // m_NeedMPICommFree = false;
+    }
     std::shared_ptr<Engine> engine;
     const bool isDefaultEngine = m_EngineType.empty() ? true : false;
     std::string engineTypeLC = m_EngineType;
diff --git a/source/adios2/toolkit/format/bp3/BP3Base.cpp b/source/adios2/toolkit/format/bp3/BP3Base.cpp
index 3f484cb4..cf419d48 100644
--- a/source/adios2/toolkit/format/bp3/BP3Base.cpp
+++ b/source/adios2/toolkit/format/bp3/BP3Base.cpp
@@ -47,8 +47,14 @@ const std::map<int, std::string> BP3Base::m_TransformTypesToNames = {
 BP3Base::BP3Base(MPI_Comm mpiComm, const bool debugMode)
 : m_MPIComm(mpiComm), m_DebugMode(debugMode)
 {
-    MPI_Comm_rank(m_MPIComm, &m_RankMPI);
-    MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+    int flag;
+    MPI_Initialized(&flag);
+    m_RankMPI = 0;
+    m_SizeMPI = 1;
+    if (flag) {
+        MPI_Comm_rank(m_MPIComm, &m_RankMPI);
+        MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+    }
     m_Profiler.IsActive = true; // default
 }
 
diff --git a/source/adios2/toolkit/transport/Transport.cpp b/source/adios2/toolkit/transport/Transport.cpp
index 83e8e8ca..ce8b5f4b 100644
--- a/source/adios2/toolkit/transport/Transport.cpp
+++ b/source/adios2/toolkit/transport/Transport.cpp
@@ -20,8 +20,14 @@ Transport::Transport(const std::string type, const std::string library,
                      MPI_Comm mpiComm, const bool debugMode)
 : m_Type(type), m_Library(library), m_MPIComm(mpiComm), m_DebugMode(debugMode)
 {
-    MPI_Comm_rank(m_MPIComm, &m_RankMPI);
-    MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+    int flag;
+    MPI_Initialized(&flag);
+    m_RankMPI = 0;
+    m_SizeMPI = 1;
+    if (flag) {
+        MPI_Comm_rank(m_MPIComm, &m_RankMPI);
+        MPI_Comm_size(m_MPIComm, &m_SizeMPI);
+    }
 }
 
 void Transport::IWrite(const char *buffer, size_t size, Status &status,
diff --git a/source/utils/bpls/bpls.cpp b/source/utils/bpls/bpls.cpp
index c2054cdc..1914fcd0 100644
--- a/source/utils/bpls/bpls.cpp
+++ b/source/utils/bpls/bpls.cpp
@@ -2934,7 +2934,7 @@ char *mystrndup(const char *s, size_t n)
 int main(int argc, char *argv[])
 {
 #ifdef ADIOS2_HAVE_MPI
-    MPI_Init(&argc, &argv);
+//    MPI_Init(&argc, &argv);
 #endif
     int retval = 1;
     try
@@ -2947,7 +2947,7 @@ int main(int argc, char *argv[])
         std::cout << e.what() << std::endl;
     }
 #ifdef ADIOS2_HAVE_MPI
-    MPI_Finalize();
+//    MPI_Finalize();
 #endif
     return retval;
 }

We can avoid the recurring MPI_Initialized(&flag); check with a small wrapper around the MPI libs that are already identified in mpidummy.h

@williamfgc
Copy link
Contributor

williamfgc commented May 24, 2019

@ax3l I understand how this is done privately...my concern is very simple: would it link with serial code? the MPI_Comm symbol gets resolved from MPI when adios2 is compiled against it, but the same MPI_Comm symbols will be missing when linking adios2 to serial code. Seems I am not getting the message somehow and showing code only won't guarantee this is possible at all. If you're correct and bpls can in fact run (without the adios1 hack), I am all ears.

@ax3l
Copy link
Contributor Author

ax3l commented May 24, 2019

I understand that and it's still needed to link against MPI although it will then not be used in communication APIs (besides checking once for MPI_Initialized). My primary concern is really just to avoid MPI_Init, it's fine if a user still has to link an unused MPI, because that will allow to use the same (parallel) adios module/package for both serial python hackery, middleware and parallel tools. The additionally (unused but needed) MPI linkage is passed by adios-config/ADIOS2Config.cmake anyway, so this does not change the current state.

@germasch
Copy link
Contributor

FWIW, MPI_Comm is not a symbol, it'd never be unresolved now matter what you link to or not. But yes, there would be actual unresolved symbols, likeMPI_Comm_dup if not linking with MPI, even if those functions would never actually end up being called at run time. But that's an issue for building an executable, which is already resolved via the current mechanisms (e.g., cmake / adios2-config). @ax3l's patch is exactly what I mean, except, as he says, this should be done via wrappers rather than throughout the code.

@pnorbert
Copy link
Contributor

pnorbert commented May 24, 2019 via email

@ax3l
Copy link
Contributor Author

ax3l commented May 24, 2019

Sounds good to me, great!

I do not fully understand the last paragraph, since we can still keep a serial version without a communicator in the public interface, imho. But probably I am just missing something that I haven't used (constructing without a comm and opening with a comm?).

@williamfgc
Copy link
Contributor

williamfgc commented May 24, 2019

I think we have to narrow down some concepts we are mixing after talking to @pnorbert :

  1. Serial: @ax3l by serial you mean code linked against adios2 and mpi, but running without MPI_Init() , I think we agree that truly serial code (gcc/g++ without -lmpi) would never work with a library compiled against MPI publicly.

  2. Cray login nodes: would 1) still run on login nodes since the don't accept MPI binaries? (this is a question, I told @pnorbert I'd test on Titan)

  3. @germasch MPI_Comm would be replaced by the corresponding opaque pointer implementation alias (ompi_communicator_t*). My point is MPI needs to be publicly linked against the binary due to its public nature in adios2 to follow 1).

@ax3l
Copy link
Contributor Author

ax3l commented May 24, 2019

  1. yes and yes :)

  2. Yes, Cray is exactly one of the systems where 1) will help with, since MPI_Init is the call that fails, as far as I remember: https://github.com/PDC-support/mpi-lab-exercises So an "mpi-linked" binary itself is no trouble.

Demonstrator on Titan seems to confirm my mumbling:

#include <mpi.h>

int main(int argc, char* argv[])
{
    int flag;
    MPI_Initialized(&flag); // ok to be called
    if(argc > 1)
    {
        MPI_Init(&argc, &argv); // not okay :)
        MPI_Finalize();
    }
    return 0;
}
> pgc++ main.cpp -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpich
> ./a.out
> ./a.out 1
[Fri May 24 11:21:49 2019] [unknown] Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(537): 
MPID_Init(249).......: channel initialization failed
MPID_Init(638).......:  PMI2 init failed: 1 
Aborted

Yay! :)

@williamfgc
Copy link
Contributor

@ax3l great! Thanks for providing this, that addresses one my concerns! Yes, then what @pnorbert proposed should be feasible for bpls and Python code on Titan. We should find a better way to name the mpi-linked without MPI_Init() special executable/script. Thanks!

@ax3l ax3l changed the title Expose Serial Constructor even with MPI Avoid MPI_Init for Serial I/O when built with MPI May 29, 2019
@germasch
Copy link
Contributor

germasch commented Jun 3, 2019

See PR #1466 for a proposed solution.

@chuckatkins
Copy link
Contributor

@williamfgc @pnorbert can we close this now as fixed?

@germasch
Copy link
Contributor

@chuckatkins I'd say it's mostly fixed with some caveats, so I'd like to see #1480 merged (or at least discussed), too.

@germasch
Copy link
Contributor

germasch commented Jul 2, 2019

As #1480 has been merged, as far as i'm concerned, this issue can be closed, @chuckatkins .

@ax3l
Copy link
Contributor Author

ax3l commented Jul 3, 2019

Looks good to me as well, really happy this works! Thanks a lot as well to @germasch for contributing much of the implementation! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants