Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url-level glob? #756

Open
SeguinBe opened this issue Sep 15, 2021 · 1 comment
Open

Url-level glob? #756

SeguinBe opened this issue Sep 15, 2021 · 1 comment

Comments

@SeguinBe
Copy link

SeguinBe commented Sep 15, 2021

Hello,

It might be because of my lack of comprehension of the package but I can not find an easy way to perform a glob directly on an url.

I see that with fsspec.get_fs_token_paths I can direclty get a path expansion for instance:

fsspec.get_fs_token_paths('gs://my-bucket/test/*.json')
>>> (<gcsfs.core.GCSFileSystem at 0x7f65ec653af0>,
     '90ad04da79e6e943b0f4d3dfba-------------33462f993faa9252',
     ['my-bucket/test/001.jsonl', 'my-bucket/test/002.jsonl'])

But then in order to get expanded urls, I need to reattach the protocol (parsing the original url), or detect if it is a local file. All of that is already done somewhere in get_fs_token_paths so I was wondering if there is an elegant way to just obtain directly: ['gs://my-bucket/test/001.jsonl', 'gs://my-bucket/test/002.jsonl']. Ideally in a way that handles absolute and local paths as well

@SeguinBe SeguinBe changed the title Top-level glob? Url-level glob? Sep 15, 2021
@martindurant
Copy link
Member

As you say, all the pieces are there, but there is this inconsistency whereby file systems refer to paths using their internal representation, normally without the protocol part.
#744 introduces _unstrip_protocol, but this is for the simple unchained case. Probably all filesystems should know how to recreate a complete URL from a path.

This also falls under the "we should have a top-level API" that picks file systems by URL ( #747 , #732 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants