Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshoot ADCP dataset #17

Open
pramod-thupaki opened this issue Aug 28, 2020 · 2 comments
Open

Troubleshoot ADCP dataset #17

pramod-thupaki opened this issue Aug 28, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@pramod-thupaki
Copy link
Contributor

Some adcp files ** are not being imported into the ERDDAP dataset. Initial tests by @n-a-t-e suggest that this is due to insufficient memory while the ERDDAP dataset is being constructed.

Other points:

  • The ADCP dataset is the largest one amongst the IOS datasets
  • Dividing the millar1* file into 3 smaller files seems to get around this problem. Note sure why this is the case. Dividing the file into smaller files is not an ideal solution and ERDDAP ought to be able to handle much larger files.

** Problem file(s):
millar1_20171007_20181015_0018m.adcp.L1.nc

@pramod-thupaki pramod-thupaki added the bug Something isn't working label Aug 28, 2020
@sjbruce
Copy link

sjbruce commented Aug 28, 2020

Last year at the Ann Arbour Code Sprint I asked Bob about the size of files in ERDDAP - was it better to use a few large files or many smaller ones. His answer was that it's better to use many smaller files - according to him, this is true whether it's local or remote (especially if retrieving files from a remote location like Amazon S3).

ERDDAP will do some internal indexing and map what files contain what data - many smaller files actually end up being more efficient.

From the most recent release notes for 2.02:

If it is convenient, it's still always a good idea to split huge tabular data files into several smaller files based on some criteria like stationID and/or time. ERDDAP will often only have to open one of the small files in response to a user's request, and thus be able to respond much faster.

Also, you might want to try bumping up the memory available to ERDDAP to be 8GB (or higher), that might help as well - when in doubt throw more RAM at the problem!

Found this thread on a the ERDDAP Google Group that gets into more details on why to split up large files: https://groups.google.com/g/erddap/c/OaX7JjV18pg?pli=1

@n-a-t-e
Copy link
Member

n-a-t-e commented Sep 8, 2020

This was a strange one, as the problematic file isn't really that big, and also isn't the largest one in the dataset. I tried bumping RAM to 12GB and it still causes ERDDAP to crash. But splitting up the file does make it work, so we can do that for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants