You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
during the last weeks I came across a MPI design issue in ADIOS1 that is quite troublesome downstream with the serial (_nompi) API.
The problem is the separation into two libraries which are defining the same API, same symbols but different functionality for the public API (& ABI). Both libraries are incompatible to each other and can not be linked at the same time without getting into trouble.
Currently, ADIOS just "mocks" MPI with mpidummy.h to regular fs, mem access and copy operations in case of the nompi interface. This is nice if one knows from the beginning that a final app will either use MPI or serial only. But that is not a compile-time decision, this is a user decision at runtime. What that leads to is, that shipped "parallel" ADIOS library can still use the "serial" API (by passing MPI_COMM_NULL) but the final app must be started in a MPI_Init context. Urgh.
Why not just link the _nompi lib?
The _nompi library and the regular "parallel" library can not be linked at the same time (outside of manual handling with dlopen), since they define the same public API (unnecessarily) and ABI but with different implementations. This forces to keep the "separation of libraries" in any downstream project and always stays a compile-time decision which is super inconvenient in package managers, high-level libraries, dependencies, etc.
There is no reason not to link a "MPI-powered" library (with additional serial API) in a serial context as well. The library should just never call "MPI_" functionality if it is not getting passed a valid communicator of some kind.
What ADIOS is basically doing is shipping an MPI mock library. Mocking functionality is great for development, debugging and testing but causes the symbol issues explained above in production.
Proposed Refactoring
Could you maybe either:
provide a truly serial API that is guaranteed to never call any MPI_ functions (and does not receive a communicator at all) or
add distinct symbol prefixing/suffixing for the _nompi libraries so that they can be linked at the same time as the parallel library (we can then encapsulate your mock via distinct intermediate link steps) or
add proper MPI_COMM_NULL handling to all functionality in at least src/read/read_bp.c and src/core/bp_utils.c: here you can simply use the functionality already implemented in mpidummy.h but just call it explicitly on if (comm == MPI_COMM_NULL) and not via MPI-function-mocking.
How to reproduce the issue
compile a parallel ADIOS library
write a serial example
link the parallel ADIOS lib (yes, see the issue with _nompi above; both libs are incompatible)
execute the serial example (that does not call MPI_Init)
Hi ADIOS team,
during the last weeks I came across a MPI design issue in ADIOS1 that is quite troublesome downstream with the serial (_nompi) API.
The problem is the separation into two libraries which are defining the same API, same symbols but different functionality for the public API (& ABI). Both libraries are incompatible to each other and can not be linked at the same time without getting into trouble.
Currently, ADIOS just "mocks" MPI with
mpidummy.h
to regular fs, mem access and copy operations in case of the nompi interface. This is nice if one knows from the beginning that a final app will either use MPI or serial only. But that is not a compile-time decision, this is a user decision at runtime. What that leads to is, that shipped "parallel" ADIOS library can still use the "serial" API (by passingMPI_COMM_NULL
) but the final app must be started in aMPI_Init
context. Urgh.Why not just link the _nompi lib?
The _nompi library and the regular "parallel" library can not be linked at the same time (outside of manual handling with
dlopen
), since they define the same public API (unnecessarily) and ABI but with different implementations. This forces to keep the "separation of libraries" in any downstream project and always stays a compile-time decision which is super inconvenient in package managers, high-level libraries, dependencies, etc.There is no reason not to link a "MPI-powered" library (with additional serial API) in a serial context as well. The library should just never call "MPI_" functionality if it is not getting passed a valid communicator of some kind.
What ADIOS is basically doing is shipping an MPI mock library. Mocking functionality is great for development, debugging and testing but causes the symbol issues explained above in production.
Proposed Refactoring
Could you maybe either:
MPI_
functions (and does not receive a communicator at all) or_nompi
libraries so that they can be linked at the same time as the parallel library (we can then encapsulate your mock via distinct intermediate link steps) orMPI_COMM_NULL
handling to all functionality in at leastsrc/read/read_bp.c
andsrc/core/bp_utils.c
: here you can simply use the functionality already implemented inmpidummy.h
but just call it explicitly onif (comm == MPI_COMM_NULL)
and not via MPI-function-mocking.How to reproduce the issue
_nompi
above; both libs are incompatible)MPI_Init
)MPI_
functionalitycc @pnorbert @jychoi-hpc @isosc
The text was updated successfully, but these errors were encountered: