-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug writing empty data sets? #217
Comments
Any ideas? |
So, reading that glkyl issue, it isn't really clear how we might try to reproduce the issue. Do you think this happens with appending an existing file? The issue sounds like it might happen without restart (at the beginning of a run). Are you using steps? ADIOS2 as more designed to optimize writing the same set of variables repeatedly, where it kind of looks like you are generating a new name for your variable for every step, which may trigger some issues. |
Can you please share such a dataset with us? At OLCF or NERSC if it is too big to share it otherwise. Are you certain the application does not call Put() on a variable that has the defined shape {0,1}. Are you saying adios inserts these variables into the output without you asking for it? |
Yes @eisenhauer, it's hard to reproduce this. I've tried to do so on my laptop but have not been able to. But it continues to happen sometimes on clusters (it's happened in three clusters now, one at MIT, one at Princeton, and at NERSC's Cori). It does seem to happen when appending to an existing file. I mean, I've never seen the first dataset we put in the file be empty or have a different size than we expected. It's always been the second or later datasets we append. I don't think we are using steps. We are simply
|
@pnorbert I'm attaching two files in which this happened: I'm attaching two data files in which this happened.
I don't think we are calling Put(). See the steps listed in the previous message to eisenhauer. It is certainly placing datasets that shouldn't be there, sometimes. Specifically when it adds those empty TimeMesh and Data variables. |
For completeness here's the link to the section in the gkyl code in charge of writing these datasets. |
@pnorbert if I recall correctly adios1.x files had the timers embedded (I don't remember if it was a hidden attribute or variable). Perhaps it's related. In adios2 we dump the json file for profiling, but that's a ON/OFF switch. |
at https://github.com/ammarhakim/gkyl/blob/82ae19b5882e12a02056d109ae3d3a2eafbf6b1a/DataStruct/DynVector.lua#L342 My bet is that the
|
Forget my question about Put(). I did not realize this was adios 1.x. The lua code shows me that you indeed call |
Using bpdump, I try to reconstruct what happened in the gk40-* run:
Up to 1500, there is one Data entry (i.e. one
Then between 1501 and 1532, there are two entries for each frame. For 1501 the local arrays have different sizes (1,1) + (2,1) hile for the rest they are the same (1,1) + (1,1)
At step 1533 everything goes back to "normal" Do you agree with my theory? Is this what happened? |
A few comments on the IO:
|
@pnorbert @williamfgc thank you very much for your comments. I was able to get back to this problem and found a potential solution. First, I found a way to reproduce the appearance of "scalar" datasets. If I just call our code (gkyl) in a loop, with each iteration restarting from the previous one and taking just a couple of steps, then a "scalar" dataset appears after 30-60 iterations. Just out of curiosity I decided to change our code for creating variables, in lines 364 and 366 of the DynVector.lua file. Previously we were passing an empty string (i.e. "") for Some additional comments/questions:
|
@manauref. It is entirely fine to define and write a global array from one process. By definition, a subset of processes (that opened an output) can define and write a global array, including a single process. There is no difference between local and global arrays at writing other then having extra metadata of global dimension and offsets. The original problem is due to handling local arrays at read time when appending more local arrays to existing steps after a restart. If you can avoid that by having global arrays, please do so. For the other points: 1.a. This is an old fear from other file formats that get corrupt on abort. adios1 bp version 3 is safe as long the app does not die during the write-out (call to advance_step() or close()). For that very rare case, there is the 1.b. Changing global dimensions over steps is allowed but it is harder to read it back. bpls shows
|
We have an application (Gkeyll) which is outputting a scalar time trace to a single file during a time dependent simulation. We buffer a certain amount of data and periodically flush it out into an ADIOS file.
We've noticed that after restarting the simulation (because the first simulation ran out of wallclock time, for example) ADIOS writes some empty datasets to the file. This either makes postprocessing difficult or impossible (at present, in our workflow).
There's a description of this issue in the gkyl repo: ammarhakim/gkyl#41
Is this something you are familiar with? is it an ADIOS bug, an issue with the file systems (e.g. NERSC's Cori), or an improper use on our side?
The text was updated successfully, but these errors were encountered: