-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new I/O bottleneck as a result of reformatting #48
Comments
Yes this is something I've experienced with dask (a tool underlying xarray). It seems like dask generally performs better with larger but fewer files. It doesn't totally matter how big the files are (probably to a certain threshold, that I haven't yet experienced) because dask doesn't read things into memory when a "read" call is made, it just figures out how to point to the files, which can take a long time with tons of files. That said, perhaps it is the most flexible to continue making the product available in more files rather than fewer so that users can 1) only download the files they need and 2) not have to worry about accidentally loading a huge file into memory. What does everyone else think? In that case, then we could recommend that eccov4py users 'reformat' the data into a format that benefits their workflow the best. For instance, users could load and re-save all 2D variables in a single file, or one file per year, or however they want it. They could additionally save it to zarr, or whatever file format they prefer and are familiar with. What do you think? I'm not in any decision making position and I'm not in charge of the file formats. I'm just providing (hopefully useful?!) suggestions :) I would be happy to provide some suggested lines as above for a README that @gaelforget is recommending, just let me know if you'd like that. |
I don't think scenario(2) will happen because "read_nctiles" is flexible in |
@gaelforget I submitted a pull request to the gcmfaces git for an updated read_nctiles.m that should improve the performance when reading V4r4 files. It now takes about the same time to read V4r3 or V4r4 files. See the pull at MITgcm/gcmfaces#12. |
Will take a look as soon as possible & report back after I've had a chance to test on standard analysis for r2, r3, and r4. Hopefully by next week (but ...) Thanks!!! |
On a different channel, @hongandyan noted that
I have not tried but this looks like a major set back and inconvenience to users!
I see this is as a separate issue from #40 but it's not unrelated cause it stems from the same reformatting that is looking more and more like its creating major problems.
The only simple solution now might be that the ECCO team at JPL & UT just adds another folder with v4r4 etc in the original nctiles formatting and file layout used in earlier releases. And then add guidelines in the READMEs to let users know which version might be best depending on whether they use ECCO-v4.py, gcmfaces, or other known software.
Linking @ifenty, @owang01, and @timothyas here as they seem likely know who is responsible for ECCO files at JPL & UT under the relevant NASA grant (not sure I even have a copy...)
The text was updated successfully, but these errors were encountered: