Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a clean, compatible NOMPI API #183

Open
ax3l opened this issue Jun 14, 2018 · 0 comments
Open

Providing a clean, compatible NOMPI API #183

ax3l opened this issue Jun 14, 2018 · 0 comments

Comments

@ax3l
Copy link
Contributor

ax3l commented Jun 14, 2018

Hi ADIOS team,

during the last weeks I came across a MPI design issue in ADIOS1 that is quite troublesome downstream with the serial (_nompi) API.

The problem is the separation into two libraries which are defining the same API, same symbols but different functionality for the public API (& ABI). Both libraries are incompatible to each other and can not be linked at the same time without getting into trouble.

Currently, ADIOS just "mocks" MPI with mpidummy.h to regular fs, mem access and copy operations in case of the nompi interface. This is nice if one knows from the beginning that a final app will either use MPI or serial only. But that is not a compile-time decision, this is a user decision at runtime. What that leads to is, that shipped "parallel" ADIOS library can still use the "serial" API (by passing MPI_COMM_NULL) but the final app must be started in a MPI_Init context. Urgh.

Why not just link the _nompi lib?

The _nompi library and the regular "parallel" library can not be linked at the same time (outside of manual handling with dlopen), since they define the same public API (unnecessarily) and ABI but with different implementations. This forces to keep the "separation of libraries" in any downstream project and always stays a compile-time decision which is super inconvenient in package managers, high-level libraries, dependencies, etc.

There is no reason not to link a "MPI-powered" library (with additional serial API) in a serial context as well. The library should just never call "MPI_" functionality if it is not getting passed a valid communicator of some kind.

What ADIOS is basically doing is shipping an MPI mock library. Mocking functionality is great for development, debugging and testing but causes the symbol issues explained above in production.

Proposed Refactoring

Could you maybe either:

  • provide a truly serial API that is guaranteed to never call any MPI_ functions (and does not receive a communicator at all) or
  • add distinct symbol prefixing/suffixing for the _nompi libraries so that they can be linked at the same time as the parallel library (we can then encapsulate your mock via distinct intermediate link steps) or
  • add proper MPI_COMM_NULL handling to all functionality in at least src/read/read_bp.c and src/core/bp_utils.c: here you can simply use the functionality already implemented in mpidummy.h but just call it explicitly on if (comm == MPI_COMM_NULL) and not via MPI-function-mocking.

How to reproduce the issue

  • compile a parallel ADIOS library
  • write a serial example
  • link the parallel ADIOS lib (yes, see the issue with _nompi above; both libs are incompatible)
  • execute the serial example (that does not call MPI_Init)
  • crash in calls to MPI_ functionality

cc @pnorbert @jychoi-hpc @isosc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant