Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADIOS1: SerialIOTest needs MPI with Parallel Lib #252

Closed
ax3l opened this issue Jun 12, 2018 · 8 comments
Closed

ADIOS1: SerialIOTest needs MPI with Parallel Lib #252

ax3l opened this issue Jun 12, 2018 · 8 comments
Assignees
Labels
backend: ADIOS1 bug help wanted third party third party libraries that are shipped and/or linked

Comments

@ax3l
Copy link
Member

ax3l commented Jun 12, 2018

Even with #228 there are way more calls to MPI_ functions when linking a parallel ADIOS1 lib. That leads to the need to actually start serial programs, such as SerialIOTest, with mpirun/exec.

That's sad. Is the MPI-moc library the only way to call the serial ADIOS API? If so, a lot more refactoring inside ADIOS src/core/bp_utils.c (incomplete below) and a little more inside src/read/read_bp.c would be needed:

diff --git a/src/core/bp_utils.c b/src/core/bp_utils.c
index 1ac8990a..94ef1532 100644
--- a/src/core/bp_utils.c
+++ b/src/core/bp_utils.c
@@ -70,6 +70,78 @@ inline BP_FILE * GET_BP_FILE (const ADIOS_FILE * fp)
 void * bp_read_data_from_buffer(struct adios_bp_buffer_struct_v1 *b, enum ADIOS_DATATYPES type, int nelems);
 int bp_parse_characteristics (struct adios_bp_buffer_struct_v1 * b, struct adios_index_var_struct_v1 ** root, uint64_t j);
 
+/* dummies: for building serial apps against a parallel ADIOS lib */
+#if defined(__APPLE__) || defined(__WIN32__) || defined(__CYGWIN__) 
+#    define lseek64 lseek
+#    define open64  open
+#endif
+
+static char mpierrmsg_dummy[MPI_MAX_ERROR_STRING];
+
+static int typesize_dummy(MPI_Datatype type)
+{
+    switch( type )
+    {
+      case MPI_INT : return sizeof( int );
+                     break;
+      case MPI_CHAR: return sizeof( char );
+                     break;
+      case MPI_REAL : return sizeof( float );
+                     break;
+      case MPI_DOUBLE : return sizeof( double );
+                     break;
+      case MPI_UNSIGNED_LONG_LONG : return sizeof( unsigned long long );
+                     break;
+      case MPI_UINT64_T : return sizeof( uint64_t );
+                     break;
+      default      : return 1 ;
+    }
+    return 1;
+}
+
+int MPI_File_open_dummy(MPI_Comm comm, char *filename, int amode, MPI_Info info, MPI_File *fh) 
+{
+    *fh = open64 (filename, amode);
+    if (*fh == -1) {
+        snprintf(mpierrmsg_dummy, MPI_MAX_ERROR_STRING, "File not found: %s", filename);
+        return -1;
+    }
+    return MPI_SUCCESS;
+}
+
+int MPI_File_close_dummy(MPI_File *fh) { return close(*fh); }
+
+int MPI_File_get_size_dummy(MPI_File fh, MPI_Offset *size) {
+    uint64_t curpos = lseek64(fh, 0, SEEK_CUR); // get the current seek pos
+    uint64_t endpos = lseek64(fh, 0, SEEK_END); // go to end, returned is the size in bytes
+    lseek64(fh, curpos, SEEK_SET);             // go back where we were
+    *size = (MPI_Offset) endpos;
+    //printf("MPI_File_get_size: fh=%d, size=%lld\n", fh, *size);
+    return MPI_SUCCESS;
+}
+
+int MPI_File_read_dummy(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status)
+{
+    // FIXME: int count can read only 2GB (*datatype size) array at max
+    uint64_t bytes_to_read = count * typesize_dummy(datatype);
+    uint64_t bytes_read;
+    bytes_read = read (fh, buf, bytes_to_read);
+    if (bytes_read != bytes_to_read) {
+        snprintf(mpierrmsg_dummy, MPI_MAX_ERROR_STRING, "could not read %" PRIu64 " bytes. read only: %" PRIu64 "\n", bytes_to_read, bytes_read);
+        return -2;
+    }
+    *status = bytes_read;
+    //printf("MPI_File_read: fh=%d, count=%d, typesize=%d, bytes read=%lld\n", fh, count, datatype, *status);
+    return MPI_SUCCESS;
+}
+
+int MPI_File_seek_dummy(MPI_File fh, MPI_Offset offset, int whence)
+{
+    uint64_t off = (uint64_t) offset;
+    lseek64 (fh, off, whence);
+    //printf("MPI_File_seek: fh=%d, offset=%lld, whence=%d\n", fh, off, whence);
+    return MPI_SUCCESS;
+}
 
 
 void bp_alloc_aligned (struct adios_bp_buffer_struct_v1 * b, uint64_t size)
@@ -233,18 +305,23 @@ static int bp_read_open (const char * filename,
           MPI_Comm comm,
           BP_FILE * fh)
 {
-    int  err;
-    int  rank;
+    int  err = MPI_SUCCESS;
+    int  rank = 0;
 
-    MPI_Comm_rank (comm, &rank);
+    if (comm != MPI_COMM_NULL)
+        MPI_Comm_rank (comm, &rank);
 
     // variable definition
     MPI_Offset  file_size;
 
     // open a file by the multiple processors within the same
     // communicator
-    err = MPI_File_open (comm, (char *) filename, MPI_MODE_RDONLY,
-            (MPI_Info) MPI_INFO_NULL, &(fh->mpi_fh));
+    if (comm == MPI_COMM_NULL)
+        err = MPI_File_open_dummy (comm, (char *) filename, MPI_MODE_RDONLY,
+                (MPI_Info) MPI_INFO_NULL, &(fh->mpi_fh));
+    else
+        err = MPI_File_open (comm, (char *) filename, MPI_MODE_RDONLY,
+                (MPI_Info) MPI_INFO_NULL, &(fh->mpi_fh));
     if (err != MPI_SUCCESS) {
         char e [MPI_MAX_ERROR_STRING];
         int len = 0;
@@ -254,7 +331,10 @@ static int bp_read_open (const char * filename,
         return adios_flag_no;
     }
 
-    MPI_File_get_size (fh->mpi_fh, &file_size);
+    if (comm == MPI_COMM_NULL)
+        MPI_File_get_size_dummy (fh->mpi_fh, &file_size);
+    else
+        MPI_File_get_size (fh->mpi_fh, &file_size);
     fh->b->file_size = file_size;
     fh->mfooter.file_size = file_size;
 
@@ -265,26 +345,43 @@ static int bp_read_open_rootonly (const char * filename,
           MPI_Comm comm,
           BP_FILE * fh)
 {
-    int  err;
-    int  rank;
+    int  err = MPI_SUCCESS;
+    int  rank = 0;
 
-    MPI_Comm_rank (comm, &rank);
+    if (comm != MPI_COMM_NULL)
+        MPI_Comm_rank (comm, &rank);
 
     // variable definition
     MPI_Offset  file_size = 0;
 
     if (rank == 0)
     {
-        // open a file by one processor then broadcast the size and the success
-        err = MPI_File_open (MPI_COMM_SELF, (char *) filename, MPI_MODE_RDONLY,
-                (MPI_Info) MPI_INFO_NULL, &(fh->mpi_fh));
-        if (err == MPI_SUCCESS) {
-            MPI_File_get_size (fh->mpi_fh, &file_size);
-            err = 0;
+        if (comm == MPI_COMM_NULL)
+        {
+            // open a file by one processor then broadcast the size and the success
+            err = MPI_File_open_dummy (MPI_COMM_SELF, (char *) filename, MPI_MODE_RDONLY,
+                    (MPI_Info) MPI_INFO_NULL, &(fh->mpi_fh));
+            if (err == MPI_SUCCESS) {
+                MPI_File_get_size_dummy (fh->mpi_fh, &file_size);
+                err = 0;
+            }
+        }
+        else
+        {
+            // open a file by one processor then broadcast the size and the success
+            err = MPI_File_open (MPI_COMM_SELF, (char *) filename, MPI_MODE_RDONLY,
+                    (MPI_Info) MPI_INFO_NULL, &(fh->mpi_fh));
+            if (err == MPI_SUCCESS) {
+                MPI_File_get_size (fh->mpi_fh, &file_size);
+                err = 0;
+            }
         }
     }
-    MPI_Bcast (&err, 1, MPI_INT, 0, comm);
-    MPI_Bcast (&file_size, 1, MPI_UNSIGNED_LONG_LONG, 0, comm);
+    if (comm != MPI_COMM_NULL)
+    {
+        MPI_Bcast (&err, 1, MPI_INT, 0, comm);
+        MPI_Bcast (&file_size, 1, MPI_UNSIGNED_LONG_LONG, 0, comm);
+    }
     fh->b->file_size = file_size;
     fh->mfooter.file_size = file_size;
     if (err)
@@ -305,9 +402,10 @@ int bp_open (const char * fname,
              MPI_Comm comm,
              BP_FILE * fh)
 {
-    int rank;
+    int rank = 0;
 
-    MPI_Comm_rank (comm, &rank);
+    if (comm != MPI_COMM_NULL)
+        MPI_Comm_rank (comm, &rank);
 
     adios_buffer_struct_init (fh->b);
 
@@ -326,7 +424,8 @@ int bp_open (const char * fname,
     }
 
     /* Broadcast to all other processors */
-    MPI_Bcast (&fh->mfooter, sizeof (struct bp_minifooter), MPI_BYTE, 0, comm);
+    if (comm != MPI_COMM_NULL)
+        MPI_Bcast (&fh->mfooter, sizeof (struct bp_minifooter), MPI_BYTE, 0, comm);
 
     if (fh->mfooter.pgs_index_offset > 0)
     {
@@ -336,7 +435,12 @@ int bp_open (const char * fname,
          * secure for future if a metadata+data BP file will ever have subfiles.
          */
         if (rank == 0)
-            MPI_File_close(&fh->mpi_fh);
+        {
+            if (comm == MPI_COMM_NULL)
+                MPI_File_close_dummy(&fh->mpi_fh);
+            else
+                MPI_File_close(&fh->mpi_fh);
+        }
         if (bp_read_open (fname, comm, fh))
         {
             return -1;
@@ -357,7 +461,8 @@ int bp_open (const char * fname,
         }
     }
 
-    MPI_Barrier (comm);
+    if (comm != MPI_COMM_NULL)
+        MPI_Barrier (comm);
     // Broadcast the index which may be bigger than 2GB, so do it in chunks
     uint64_t bytes_sent = 0;
     int32_t to_send = 0;
@@ -373,8 +478,8 @@ int bp_open (const char * fname,
         {
             to_send = footer_size - bytes_sent;
         }
-
-        MPI_Bcast (fh->b->buff + bytes_sent, to_send, MPI_BYTE, 0, comm);
+        if (comm != MPI_COMM_NULL)
+            MPI_Bcast (fh->b->buff + bytes_sent, to_send, MPI_BYTE, 0, comm);
         bytes_sent += to_send;
     }
 
@@ -604,8 +709,13 @@ int bp_close (BP_FILE * fh)
     MPI_File mpi_fh = fh->mpi_fh;
 
     adios_errno = 0;
-    if (fh->mpi_fh != MPI_FILE_NULL)
-        MPI_File_close (&mpi_fh);
+    if (mpi_fh != MPI_FILE_NULL)
+    {
+        if (fh->comm == MPI_COMM_NULL)
+            MPI_File_close_dummy (&mpi_fh);
+        else
+            MPI_File_close (&mpi_fh);
+    }
 
     close_all_BP_subfiles (fh);
 
@@ -820,8 +930,16 @@ int bp_read_minifooter (BP_FILE * bp_struct)
         memset (b->buff, 0, MINIFOOTER_SIZE);
         b->offset = 0;
     }
-    MPI_File_seek (bp_struct->mpi_fh, (MPI_Offset) attrs_end, MPI_SEEK_SET);
-    MPI_File_read (bp_struct->mpi_fh, b->buff, MINIFOOTER_SIZE, MPI_BYTE, &status);
+    if (bp_struct->comm == MPI_FILE_NULL)
+    {
+        MPI_File_seek_dummy (bp_struct->mpi_fh, (MPI_Offset) attrs_end, MPI_SEEK_SET);
+        MPI_File_read_dummy (bp_struct->mpi_fh, b->buff, MINIFOOTER_SIZE, MPI_BYTE, &status);
+    }
+    else
+    {
+        MPI_File_seek (bp_struct->mpi_fh, (MPI_Offset) attrs_end, MPI_SEEK_SET);
+        MPI_File_read (bp_struct->mpi_fh, b->buff, MINIFOOTER_SIZE, MPI_BYTE, &status);
+    }
 
     /*memset (&mh->pgs_index_offset, 0, MINIFOOTER_SIZE);
     memcpy (&mh->pgs_index_offset, b->buff, MINIFOOTER_SIZE);*/
@@ -900,9 +1018,14 @@ int bp_read_minifooter (BP_FILE * bp_struct)
     /* It will be sent to all processes */
     uint64_t footer_size = mh->file_size - mh->pgs_index_offset;
     bp_realloc_aligned (b, footer_size);
-    MPI_File_seek (bp_struct->mpi_fh,
-            (MPI_Offset)  mh->pgs_index_offset,
-            MPI_SEEK_SET);
+    if (bp_struct->comm == MPI_FILE_NULL)
+        MPI_File_seek_dummy (bp_struct->mpi_fh,
+                (MPI_Offset)  mh->pgs_index_offset,
+                MPI_SEEK_SET);
+    else
+        MPI_File_seek (bp_struct->mpi_fh,
+                (MPI_Offset)  mh->pgs_index_offset,
+                MPI_SEEK_SET);
 
     // if we need to read > 2 GB, need to do it in parts
     // since count is limited to MAX_MPIWRITE_SIZE (signed 32-bit max).
@@ -921,7 +1044,10 @@ int bp_read_minifooter (BP_FILE * bp_struct)
             to_read = footer_size - bytes_read;
         }
 
-        err = MPI_File_read (bp_struct->mpi_fh, b->buff+bytes_read, to_read, MPI_BYTE, &status); 
+        if (bp_struct->comm == MPI_FILE_NULL)
+            err = MPI_File_read_dummy (bp_struct->mpi_fh, b->buff+bytes_read, to_read, MPI_BYTE, &status); 
+        else
+            err = MPI_File_read (bp_struct->mpi_fh, b->buff+bytes_read, to_read, MPI_BYTE, &status); 
         if (err != MPI_SUCCESS) 
         {
             char e [MPI_MAX_ERROR_STRING];
diff --git a/src/read/read_bp.c b/src/read/read_bp.c
index e3c9a42f..2b4a59f5 100644
--- a/src/read/read_bp.c
+++ b/src/read/read_bp.c
@@ -1721,14 +1721,15 @@ ADIOS_FILE * adios_read_bp_open (const char * fname, MPI_Comm comm, enum ADIOS_L
 
 ADIOS_FILE * adios_read_bp_open_file (const char * fname, MPI_Comm comm)
 {
-    int rank;
+    int rank = 0;
     BP_PROC * p;
     BP_FILE * fh;
     ADIOS_FILE * fp;
 
     log_debug ("adios_read_bp_open_file\n");
 
-    MPI_Comm_rank (comm, &rank);
+    if (comm != MPI_COMM_NULL)
+        MPI_Comm_rank (comm, &rank);
 
     fh = BP_FILE_alloc (fname, comm);
@ax3l ax3l added bug help wanted wontfix third party third party libraries that are shipped and/or linked backend: ADIOS1 labels Jun 12, 2018
@ax3l
Copy link
Member Author

ax3l commented Jun 13, 2018

@C0nsultant I think there is a second solution that we can try.
If we separate the ADIOS1 serial and parallel backend translation units well, we could link ADIOS1 twice. We then create a serial and parallel adios library artifact that we then link into the final library.

With that workflow, we can use the always provided _nompi libs with the mock properly as ADIOS intended it to be used.

@ax3l ax3l removed the wontfix label Jun 13, 2018
@ax3l ax3l self-assigned this Jun 13, 2018
@ax3l
Copy link
Member Author

ax3l commented Jun 13, 2018

Oh wow, that draft of CMake magic looks messy...

@ax3l
Copy link
Member Author

ax3l commented Jun 13, 2018

@C0nsultant is it possible to get rid of MPI types and includes in include/openPMD/IO/ADIOS/ADIOS1IOHandler.hpp the same way as in include/openPMD/IO/HDF5/HDF5IOHandler.hpp?

They should not be needed there and can be set to e.g. MPI_COMM_NULL in the implementation, we have still the ParallelADIOS1IOHandler.hpp for MPI comm passing.

@C0nsultant
Copy link
Member

C0nsultant commented Jun 13, 2018

All the MPI crux is in there precisely because ADIOS designed their _NOMPI to use the mock provided in <adios_mpi.h>.
If your propsed split works, we should be able to pull it out of the interface into the implementation by hard-coding MPI_COMM_NULL indstead of relying on ADIOS1IOHandlerImpl::m_mpiComm. This will induce modifications in both ADIOS1IOHandlerImpl and ParallelADIOS1IOHandlerImpl.

I'll give it a try.

@ax3l
Copy link
Member Author

ax3l commented Jun 13, 2018

Yes, for the first part, but I think that can be hidden inside ADIOS1IOHandler.cpp for the serial mock.

@C0nsultant
Copy link
Member

That should be the way to go. It will probably lead to code duplication in ParallelADIOS1IOHandlerImpl, though. Currently both Impls rely on the fact that the interface is the same, no matter if _NOMPI. If the relevant variables are not available in the interface, we'll have to duplicate somewhere.

@ax3l
Copy link
Member Author

ax3l commented Jun 13, 2018

Yes, don't let us optimize on that coincident and keep it separate as in HDF5, even if a bit code is doubled then. This will allow me to separate the translation units better for linking against the static and the parallel ADIOS1 libs.

@ax3l
Copy link
Member Author

ax3l commented Jun 14, 2018

Opened a upstream issue in ornladios/ADIOS#183

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: ADIOS1 bug help wanted third party third party libraries that are shipped and/or linked
Projects
None yet
Development

No branches or pull requests

2 participants