Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Timeout handling. #84

Open
kokoko3k opened this issue Apr 15, 2020 · 4 comments
Open

[Request] Timeout handling. #84

kokoko3k opened this issue Apr 15, 2020 · 4 comments
Labels
feature New feature request/PR

Comments

@kokoko3k
Copy link

First, i'd like to thank you for bindfs.
Probably my usecase for it is a bit different from the intended one, because i use it to mirror problematic network shares like smb and nfs ones, that lives in kernel space.
Since bindfs lives in userspace, i can gracefully recover from situations when the share does not answer.
I can just kill the bindfs proces, umount it and remout so that the application does not hangs forever.
I made a script and a service that takes care of that, see here:
https://bbs.archlinux.org/viewtopic.php?pid=1898333#p1898333

If you have time, could you consider to add a timeout parameter to bindfs mount options that restarts the bindfs mount or just kill it when the "mirrored" path does not answer?

Thank you again!

@mpartel mpartel added the feature New feature request/PR label Apr 25, 2020
@mpartel
Copy link
Owner

mpartel commented Apr 25, 2020

Hi and sorry for taking a while to reply.

Ideally the network filesystems would have their own timeouts, but if they commonly don't, then this seems like a reasonable feature. Unfortunately it's not a trivial feature and I have less free time than I used to, so I can't make promises on when and if I'll do this. Pull requests are welcome.

There are a few ways this could be implemented:

(1) There could be a watchdog thread that gets activated for the duration of each operation. It'd kill and restart bindfs if an operation takes too long. This would be pretty much equivalent to your script.
(2) Bindfs could forward I/O operations to a subprocess and kill and restart it if it times out.

I'm leaning towards (2). While (1) might seem simpler and have less perf overhead, I'm not sure to what extent it's a source of bugs and confusion for users that the mount point can temporarily look empty. (2) would also allow for automatic retries.

(I'll also note that if a network FS really hangs forever, then I'd highly suspect that killing the caller won't stop the hang from effectively leaking kernel memory and/or a file descriptor and/or a zombie PID. Again it'd be better to implement a timeout in the network FS.)

@kokoko3k
Copy link
Author

Hello,
I can understand that a network filesystem can hang forever in because it tries hard to not interrupt the workflow, and it can even be desiderable with an unreliable network, but as far as i could see, neither nfs, nor cifs provide a way to report an error instead of trying and trying.
To be honest, i did not even tried to request such feature. I think that if they never implemented it they would not do now, maybe is a "restriction" defined in the protocol itself, dunno.

It is true that seeing a mounted share as empty could scare the user, but the timeout behaviour could just be disabled by default, so that the user who activates it knows what is going on.

I really don't know how to write C code, sorry, thanks a lot for considering my proposal.

@mpartel
Copy link
Owner

mpartel commented Apr 25, 2020

Looking at https://manpages.debian.org/testing/cifs-utils/mount.cifs.8.en.html
I see options like

handletimeout
soft
noresilienthandles
nopersistenthandles
echo_interval

some of which are on by default.

These (especially 'soft') look like they should cause timeouts instead of hangs. So if those are all set appropriately and CIFS still hangs, it seems like a bug in CIFS.

Unless the server (or server cluster?) has an internal issue but stays alive enough to respond to the client's echo request.

@kokoko3k
Copy link
Author

kokoko3k commented Apr 25, 2020

"soft" is on by default and it never worked for me in years, i doubt it is intended to make the client to timeout, maybe when they write "not hang", they mean that you can sent SIGINT to the app and it exits without the need for a SIGKILL (?)
Also, echo_interval is set to 60 seconds, but it does not trig any timeout either.
Other options i can see all refers to how the server should behave when the client reconnects, but my problem comes before it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature request/PR
Projects
None yet
Development

No branches or pull requests

2 participants