Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding call of stat() for parent dir #36

Open
tfelbr opened this issue Sep 8, 2023 · 2 comments
Open

Question regarding call of stat() for parent dir #36

tfelbr opened this issue Sep 8, 2023 · 2 comments

Comments

@tfelbr
Copy link

tfelbr commented Sep 8, 2023

Hello,
while using sshfs in fsspec.open_files(), I discovered that stat() is called for the parent directory of the wanted files, even if it is already clear that this must be a directory. While this is most certainly not an issue for most cases, the sftp server I have to use behaves somewhat strange regarding this, as I get a permission error when trying to call stat() on these directories.

When using the default sftp implementation from fsspec there is no issue at all, so at least for me it seems that it should be possible without a call to stat(). Is there any way to achieve this with this library as well? I really like to use it because of performance reasons compared to sftp. Thank you!

@efiop
Copy link
Member

efiop commented Sep 8, 2023

Hi @Bizarious . Sounds like a bug, maybe you could pinpoint specific line in the code? If you are getting a permission error, I suppose you have a traceback for that laying around as well?

@tfelbr
Copy link
Author

tfelbr commented Sep 22, 2023

Sorry for the late reply, there were some external circumstances that prevented me from responding.

At first, thank you for the answer! A bit more context would be helpful as well I think:

I'm using fsspec.open_files() with a url that looks like this one:

ssh://user:password@sftp_host/root/path/*.zip

Now it seems the filesystem calls stat on the directory path (considering the example above) despite it should not be necessary. The relevant part of the trace looks like this:

File ".../lib/python3.10/site-packages/fsspec_sync/sync.py", line 128, in fsspec_sync
    source_open_files: OpenFiles = fsspec.open_files(
  File ".../lib/python3.10/site-packages/fsspec/core.py", line 282, in open_files
    fs, fs_token, paths = get_fs_token_paths(
  File ".../lib/python3.10/site-packages/fsspec/core.py", line 641, in get_fs_token_paths
    paths = [f for f in sorted(fs.glob(paths)) if not fs.isdir(f)]
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 775, in _glob
    allpaths = await self._find(
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 841, in _find
    if withdirs and path != "" and await self._isdir(path):
  File ".../lib/python3.10/site-packages/fsspec/asyn.py", line 652, in _isdir
    return (await self._info(path))["type"] == "directory"
  File ".../lib/python3.10/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File ".../lib/python3.10/site-packages/sshfs/spec.py", line 141, in _info
    attributes = await channel.stat(path)
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 4573, in stat
    return await self._handler.stat(path, flags)
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 2695, in stat
    return cast(SFTPAttrs,  await self._make_request(
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 2454, in _make_request
    result = self._packet_handlers[resptype](self, resp)
  File ".../lib/python3.10/site-packages/asyncssh/sftp.py", line 2470, in _process_status
    raise exc
asyncssh.sftp.SFTPPermissionDenied: Permission denied.

What I discovered using the debugger, was that fsspec splits the path in the _glob function and calls _find() on the directory, so /root/path/ in our case. find() then calls _isdir() on that path which in turn calls _info() of the ssh filesystem, which leads to a call of stat() to this directory, leading in a permission error in my case. The relevant line in sshfs would be 141 in sshfs/spec.py.

Of course we are talking about the async implementation of glob() and find(), but I compared it to the normal ones and they look mostly similar, especially the call to isdir().

I am not sure if there is anything that can be done inside the ssh implementation, but as I already mentioned the default sftp implementation does not have this problem. Please let me know what you think and if I missed anything! Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants