Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix coo_map when only one match from regex pattern #525

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

aaTman
Copy link
Contributor

@aaTman aaTman commented Oct 30, 2024

I ran into an issue when using coo_map in MultiZarrToZarr that was caused by using .groups()[0] instead of .group(), where there is only one returned regex match in the string (at least, as far as I can tell is the case).

To fix this, I kept the original functionality but would catch if the .groups() tuple was empty. In the situation I ran into this bug, I used the following code which caused an IndexError due to a length 0 tuple:

pattern = re.compile(r"[A-Za-z]\d\d(?![^ ]*[\\\/])", re.IGNORECASE)
file_list = glob.glob(f"{self.directory}/*")
mzz = MultiZarrToZarr(
    file_list,
    coo_map={"member": pattern},
    concat_dims=["member", "step", "time"],
    identical_dims=["latitude", "longitude"],
)
multi_kerchunk = mzz.translate()

The string (and similar ones) that returned only one matching pattern to the regex was most recently:

/var/folders/vz/txd62qzn76g9f6cxg_8_76cw0000gn/T/tmp_kwkysl7/pres_msl_2002110200_p03_01.json

Where my goal was to subset the "p03" and other ensemble members from the GEFSv12 Retrospective data.

It seems this completely fixed the issue though happy to discuss or try some other edge cases.

@martindurant
Copy link
Member

Your interpretation appears to be right, and I am surprised that your can have a group() without having anything output by groups().

@martindurant
Copy link
Member

These changes I think should fix the datatree thing, as well as requiring the current version of xarray: https://github.com/fsspec/kerchunk/pull/523/files#diff-65c089582c22ac69bff6e677f8c5dcf10b650de2fac7193ed15b013c7866925d

@martindurant martindurant merged commit 56ef473 into fsspec:main Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants