-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in RT of parquet detection #278
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #278 +/- ##
=======================================
Coverage 67.92% 67.92%
=======================================
Files 41 41
Lines 2516 2516
=======================================
Hits 1709 1709
Misses 807 807
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Typing: #274 |
virtualizarr/readers/kerchunk.py
Outdated
@@ -38,7 +38,7 @@ def open_virtual_dataset( | |||
fs = _FsspecFSFromFilepath(filepath=filepath, reader_options=reader_options) | |||
|
|||
# The kerchunk .parquet storage format isn't actually a parquet, but a directory that contains named parquets for each group/variable. | |||
if fs.filepath.endswith("ref.parquet"): | |||
if fs.filepath.endswith(".parquet"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to fsspec/kerchunk#519 (comment), you could also check for the presence of f"{fs.filepath}/.zmetadata"
, which should avoid trying to open actual parquet files – the only downside is a additional request, but since we already check magic bytes elsewhere I don't think that's a concern:
if fs.filepath.endswith(".parquet"): | |
if fs.filepath.endswith(".parquet") and fs.fs.isfile(f"{fs.filepath}/.zmetadata"): |
(or define isfile
/ exists
on _FsspecFSFromFilepath
and use that instead of fs.fs.isfile
)
Also, do we want to support .parq
, as well, or do we expect people to always end the filename with .parquet
? If we do want to support .parq
, you can use fs.filepath.endswith((".parquet", ".parq"))
(not that I'd need that myself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @keewis!
Also, do we want to support .parq, as well, or do we expect people to always end the filename with .parquet? If we do want to support .parq, you can use fs.filepath.endswith((".parquet", ".parq")) (not that I'd need that myself).
I don't have really strong opinions on this. Adding more file name guessing doesn't feel great to me. I could see people naming files with a .pqt
suffix since it's really just a directory name we're looking at. Maybe we just stick with .parquet
and mention in the docs and/or in the ValueError?
cc @TomNicholas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much of an opinion. Explicit is better than guessing, but clear documentation and error message either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both! I added a bit to the error message.
Co-authored-by: Justus Magin <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Tom Nicholas <[email protected]>
No description provided.