Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly close (SSH) file systems? #1682

Open
mxmlnkn opened this issue Sep 19, 2024 · 4 comments
Open

How to properly close (SSH) file systems? #1682

mxmlnkn opened this issue Sep 19, 2024 · 4 comments

Comments

@mxmlnkn
Copy link
Contributor

mxmlnkn commented Sep 19, 2024

I did open an SSH file system and used it successfully:

o = fsspec.open("ssh://[email protected]")
o.fs.listdir("/")

Then, I lost the SSH connection because I lost WLAN connectivity because I entered suspend mode on my notebook. After that, I got this error on o.fs.listdir:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.12/site-packages/fsspec/spec.py", line 1593, in listdir
    return self.ls(path, detail=detail, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.12/site-packages/fsspec/implementations/sftp.py", line 127, in ls
    stats = [self._decode_stat(stat, path) for stat in self.ftp.listdir_iter(path)]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/paramiko/sftp_client.py", line 278, in listdir_iter
    t, msg = self._request(CMD_OPENDIR, path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/paramiko/sftp_client.py", line 821, in _request
    num = self._async_request(type(None), t, *arg)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/paramiko/sftp_client.py", line 846, in _async_request
    self._send_packet(t, msg)
  File "/usr/lib/python3/dist-packages/paramiko/sftp.py", line 198, in _send_packet
    self._write_all(out)
  File "/usr/lib/python3/dist-packages/paramiko/sftp.py", line 162, in _write_all
    n = self.sock.send(out)
        ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/paramiko/channel.py", line 801, in send
    return self._send(s, m)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/paramiko/channel.py", line 1198, in _send
    raise socket.error("Socket is closed")
OSError: Socket is closed

I was not able to get around this error without closing and restarting the Python interactive interpreter itself. Note that this error takes a while. Directly after loosing connectivity, or enabling VPN, the listdir call hangs multiple minutes before I loose patience. I tried unsuccessfully:

  • o.close()
  • o.fs.close() # no attribute "close"
  • Simply reopening the same SSH connection: fsspec.open("ssh://[email protected]").fs.listdir('/')
  • o.fs.client.close(). This fixes the hang, but now I get the Socket is closed error, which I also got after ~30 minutes (timeout?), with no way to reopen the same connection again.
  • quit ipython3 and reopen it: works fine

I am especially stumped as to why the last one did not work. It seems that SSH connections are somehow cached ad some layer.

I also looked into the paramiko API specification and found this:

close()

Close this SSHClient and its underlying Transport.

This should be called anytime you are done using the client object.

Warning

Paramiko registers garbage collection hooks that will try to automatically close connections for you, but this is not presently reliable. Failure to explicitly close your client after use may lead to end-of-process hangs!

The fsspec SFTP implementation never calls this close method on the Client. As far as I can see, the client is only opened, never closed.

I would expect all other remote file systems to have similar issues, i.e., how can I ensure that the connection is properly closed?

Edit: I noticed that there is some kind of caching going on. This would explain why simply trying to reopen the SSH mount did not work.

@martindurant
Copy link
Member

The .client object has a close() as you say, but I see no way to check if the connection is still live. In any case, calling connect() should first close any client that might already be there, and that would give you a way forward here.

Edit: I noticed that there is some kind of caching going on.

I don't think that condition does anything anymore. But actually, instances in fsspec are generally cached: if you instantiate with exactly the same arguments as previously, you get back the same object as before. You can pass skip_instance_cache=True to disable this, or .clear_instance_cache to discard stored instances (which will del them)

@erikvanzijst
Copy link

Running into the same issue. What's the intended way to clean up after a filesystem instance is no longer needed?

Is it the responsibility of fs implementations to anticipate having to reconnect automatically to their remote servers when a connection has timed out? If so, I don't see any implementations actually do that.

Or should long running applications explicitly forego the fs instance cache (skip_instance_cache=True) each time an instance is retrieved?

The latter works, but as fsspec does not seem to have anything in the way of automatic cleanup, old fs instances will leak file descriptors that are never explicitly closed.

Having applications hardwire calls to sftp.client.close() is not appropriate as that breaks the protocol abstraction fsspec is meant to provide.

Is this an oversight in the fsspec API design, or are am I using it wrong?

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Nov 20, 2024

@erikvanzijst I also saw your comment on the paramiko performance issue. If you have not found it already, I want to mention that, in the end, I have switched to using asyncssh, which I found much more well-maintained and also has fsspec bindings here: https://github.com/fsspec/sshfs I don't remember testing the connection loss issue with asyncssh though ... It might have the same issue, but at least the performance should be better.

@erikvanzijst
Copy link

Thanks for the suggestion @mxmlnkn. I've gone and switched. The performance of fsspec/sshfs is much more consistent and generally much higher than Paramiko.

I did notice sshfs does not appear to implement any read-ahead buffering when downloading, penalizing many small reads (like iterating over the lines of a remote txt file), but wrapping it with a BufferedReader resolves that.

This also appears to better deal with cleanup. When reading a file using with fsspec.open(url) as f: the close on the way out correctly closes the connection and the async event loop thread it created. Whereas with Paramiko I too was seeing the weird destructor issue which didn't instill any further confidence:

Exception ignored in: <function SFTPFile.__del__ at 0x7f5ca0b46b90>
Traceback (most recent call last):
  File "./paramiko/paramiko/sftp_file.py", line 76, in __del__
  File "./paramiko/paramiko/sftp_file.py", line 102, in _close
  File "./paramiko/paramiko/sftp_client.py", line 875, in _async_request
  File "./paramiko/paramiko/message.py", line 279, in add_string
TypeError: 'NoneType' object is not callable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants