Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic top-level file-system #732

Open
brl0 opened this issue Aug 23, 2021 · 1 comment
Open

generic top-level file-system #732

brl0 opened this issue Aug 23, 2021 · 1 comment

Comments

@brl0
Copy link

brl0 commented Aug 23, 2021

I wanted to get feedback about an idea based on @martindurant's proposal in #41.

Here is his original comment:

Like open/open_files, which find the file-system of interest and use it with parameters, could implement a generic top-level file-system which finds the correct instance and use it depending on the protocol implicit in the path given. This would become the primary user-facing API.

This can also be the place to implement globs or recursive where multiple files make sense.

Should allow copy/move/(sync?) between file-systems.

In short, I am thinking a relatively simple generic fsspect filesystem could be created based on UPath by @andrewfulton9 with @Quansight.

This might provide a clean approach to managing storage options for inter-filesystem operations, since each path object could be instantiated with appropriate options, which I think might be a more straightforward approach than passing around lots of *_kwargs style dicts (not to say that should or should not also be supported). This might make it easier to then implement requests like #588, which currently seems to be somewhat challenging, as #723 is attempting to do.

Since UPath inherits from pathlib, it is also conceivable that this generic filesystem could at some point optionally work as a replacement for the current local filesystem implementation.

Another possible benefit is that this might provide a simple way to integrate a base test suite as suggested by @TomAugspurger in #650, see UPath's BaseTests. I think running a base set of tests like these, although possibly reorganized a bit as in #651, directly on the upstream file systems could go a long way to achieving consistency and compatibility across the various implementations.

It might also be worth mentioning the possibility of making this async compatible at some point, maybe with something like aiopath or aiofiles, although I have not tested or used these.

There are a couple of current issues that may be a bit of a hindrance, although I am confident these issues are fixable. To my knowledge (based on very limited and simple testing), it seems that UPath currently does not support chained URLs. Also, some filesystems may not yet work properly without some effort.

Currently, UPath includes some subclassed implementations based on the core implementation. I wonder if it would be possible to reduce UPath to a single implementation, perhaps with tighter integration between the projects? I suspect that any generic top-level file-system would face similar issues and find opportunities to make upstream adjustments to ensure compatibility.

For some additional background, the discussion related to UPath in #434 is worth noting here (which I will quote from slightly out of context), in particular this comment by @martindurant:

Since it needs no further dependencies, it might well be hosted within fsspec

And this subsequent comment by @andrewfulton9:

I am definitely open to merging it into fsspec

I'm eager for any thoughts, issues, concerns, etc about this idea. If there is interest, and time permitting, I'd be willing to take a swing at an initial POC of the generic filesystem, although that should by no means dissuade anybody who has interest in making such an attempt.

Finally, thanks to all that have contributed or supported these awesome projects, which make my job easier and more fun.

@martindurant
Copy link
Member

That's a very decent suggestion - especially since it doesn't preclude the str+kwargs or default instances version that I was describing before. In the case of Upaths, the kwargs are bound in the instance, so you don't need to worry about it.

In short: yes, I'd be very happy to see a module with generic functions that work on Upaths, whichever filesystems they may be defined with.

I am keen for fsspec not to have dependencies, as mentioned above. I don't know how much work it would be to merge Upath in here, versus making a module that cannot import without Upath versus generic functions that work for str and Upaths/pathlib.Path (if they are importable and so can be instantiated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants