You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It might be because of my lack of comprehension of the package but I can not find an easy way to perform a glob directly on an url.
I see that with fsspec.get_fs_token_paths I can direclty get a path expansion for instance:
fsspec.get_fs_token_paths('gs://my-bucket/test/*.json')
>>> (<gcsfs.core.GCSFileSystem at 0x7f65ec653af0>,
'90ad04da79e6e943b0f4d3dfba-------------33462f993faa9252',
['my-bucket/test/001.jsonl', 'my-bucket/test/002.jsonl'])
But then in order to get expanded urls, I need to reattach the protocol (parsing the original url), or detect if it is a local file. All of that is already done somewhere in get_fs_token_paths so I was wondering if there is an elegant way to just obtain directly: ['gs://my-bucket/test/001.jsonl', 'gs://my-bucket/test/002.jsonl']. Ideally in a way that handles absolute and local paths as well
The text was updated successfully, but these errors were encountered:
As you say, all the pieces are there, but there is this inconsistency whereby file systems refer to paths using their internal representation, normally without the protocol part. #744 introduces _unstrip_protocol, but this is for the simple unchained case. Probably all filesystems should know how to recreate a complete URL from a path.
This also falls under the "we should have a top-level API" that picks file systems by URL ( #747 , #732 )
Hello,
It might be because of my lack of comprehension of the package but I can not find an easy way to perform a glob directly on an url.
I see that with
fsspec.get_fs_token_paths
I can direclty get a path expansion for instance:But then in order to get expanded urls, I need to reattach the protocol (parsing the original url), or detect if it is a local file. All of that is already done somewhere in
get_fs_token_paths
so I was wondering if there is an elegant way to just obtain directly:['gs://my-bucket/test/001.jsonl', 'gs://my-bucket/test/002.jsonl']
. Ideally in a way that handles absolute and local paths as wellThe text was updated successfully, but these errors were encountered: