Npz sniffing: do not read the whole file #17672
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that an upload of https://data.qiime2.org/2024.2/common/silva-138-99-nb-classifier.qza fails because of high memory load (for the file in question 18GB). One can try on usegalaxy.org.
Reason was that the npz sniffer reads the whole file (npz seems to be just zip files).
Also tried to use the
mmap_mode
forload
(https://numpy.org/doc/stable/reference/generated/numpy.load.html) but this changed nothing.Btw. I used memray which showed me the problem in seconds.
Could also backport further if needed (#11957)
How to test the changes?
(Select all options that apply)
License