Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libdevmapper should be thread-safe #67

Closed
DemiMarie opened this issue Feb 6, 2022 · 11 comments
Closed

libdevmapper should be thread-safe #67

DemiMarie opened this issue Feb 6, 2022 · 11 comments

Comments

@DemiMarie
Copy link

libdevmapper should be safe to use in multiple threads concurrently.

@zkabelac
Copy link
Contributor

zkabelac commented Feb 6, 2022

Unfortunately this is very complicated - and currently there is no manpower to handle this task.
We would prefer user stay on higher level - like using 'lvm', 'multipath' tools.
Stepping on libdm level requires deep knowledge about internal management of DM devices.

Portion of libdm is thread-safe and used as such by i.e. dmeventd - but without knowing consequences of threaded usage, it would be quite hard to expose and support this as public API.

The reason #1 is handling of 'suspends'. What may work one day in future as some sort of user-space ''DM" management daemon. But there can't be expected threaded ioctl() based library support as is libdm today.

@zkabelac zkabelac closed this as completed Feb 6, 2022
@DemiMarie
Copy link
Author

Unfortunately this is very complicated - and currently there is no manpower to handle this task. We would prefer user stay on higher level - like using 'lvm', 'multipath' tools. Stepping on libdm level requires deep knowledge about internal management of DM devices.

The problem with this is that it causes problems for those (such as Stratis) writing their own volume managers. Stratis had to resort to wrapping calls to libcryptsetup (which uses libdevmapper) in a global lock. If this is sufficient, it should be fairly easy to implement in libdevmapper; I might even be able to make a PR for it. On the other hand, if it is not sufficient, then Stratis is broken.

Portion of libdm is thread-safe and used as such by i.e. dmeventd - but without knowing consequences of threaded usage, it would be quite hard to expose and support this as public API.

The reason #1 is handling of 'suspends'. What may work one day in future as some sort of user-space ''DM" management daemon. But there can't be expected threaded ioctl() based library support as is libdm today.

Can you explain? I know that libdevmapper needs to call mlockall() or similar to avoid deadlocks. But I do not see how that prevents it from being thread-safe. What if there was an API that said, “I have already called mlockall() and will never unlock my RAM; do not worry about memory locking.”?

@zkabelac
Copy link
Contributor

zkabelac commented Feb 6, 2022

Unfortunately this is very complicated - and currently there is no manpower to handle this task. We would prefer user stay on higher level - like using 'lvm', 'multipath' tools. Stepping on libdm level requires deep knowledge about internal management of DM devices.

The problem with this is that it causes problems for those (such as Stratis) writing their own volume managers. Stratis had to resort to wrapping calls to libcryptsetup (which uses libdevmapper) in a global lock. If this is sufficient, it should be fairly easy to implement in libdevmapper; I might even be able to make a PR for it. On the other hand, if it is not sufficient, then Stratis is broken.

There are more ways to skin the cat - AFAIK Stratis is using Rust - so most likely they have their own API.
Also Stratis is targeting quite limited subset of lvm2 functionality - so for that usage threads might be less complicated to handle.
For lvm2 we prefer users to go though lvm command line interface.

Portion of libdm is thread-safe and used as such by i.e. dmeventd - but without knowing consequences of threaded usage, it would be quite hard to expose and support this as public API.
The reason #1 is handling of 'suspends'. What may work one day in future as some sort of user-space ''DM" management daemon. But there can't be expected threaded ioctl() based library support as is libdm today.

Can you explain? I know that libdevmapper needs to call mlockall() or similar to avoid deadlocks. But I do not see how that prevents it from being thread-safe. What if there was an API that said, “I have already called mlockall() and will never unlock my RAM; do not worry about memory locking.”?

mlockall() is unfortunately not good way of using it - it can lock i.e. whole locale files - these have hundreds of MiB....
There are lot of other problems where even i.e. global locks do not help - i.e. how to handle dependencies between device and threads - so i.e. thread A is not manipulating devices, that are also used by thread B....
All 'simple' solutions tend to work only for constrained environment.
So while we would like to have some more thread friendly version - it's currently well beyond are capacity...

@DemiMarie
Copy link
Author

Unfortunately this is very complicated - and currently there is no manpower to handle this task. We would prefer user stay on higher level - like using 'lvm', 'multipath' tools. Stepping on libdm level requires deep knowledge about internal management of DM devices.

The problem with this is that it causes problems for those (such as Stratis) writing their own volume managers. Stratis had to resort to wrapping calls to libcryptsetup (which uses libdevmapper) in a global lock. If this is sufficient, it should be fairly easy to implement in libdevmapper; I might even be able to make a PR for it. On the other hand, if it is not sufficient, then Stratis is broken.

There are more ways to skin the cat - AFAIK Stratis is using Rust - so most likely they have their own API. Also Stratis is targeting quite limited subset of lvm2 functionality - so for that usage threads might be less complicated to handle. For lvm2 we prefer users to go though lvm command line interface.

Makes sense.

Portion of libdm is thread-safe and used as such by i.e. dmeventd - but without knowing consequences of threaded usage, it would be quite hard to expose and support this as public API.
The reason #1 is handling of 'suspends'. What may work one day in future as some sort of user-space ''DM" management daemon. But there can't be expected threaded ioctl() based library support as is libdm today.

Can you explain? I know that libdevmapper needs to call mlockall() or similar to avoid deadlocks. But I do not see how that prevents it from being thread-safe. What if there was an API that said, “I have already called mlockall() and will never unlock my RAM; do not worry about memory locking.”?

mlockall() is unfortunately not good way of using it - it can lock i.e. whole locale files - these have hundreds of MiB....

Been there, done that. Qubes OS calls cryptsetup with LC_ALL=C to avoid this.

There are lot of other problems where even i.e. global locks do not help - i.e. how to handle dependencies between device and threads - so i.e. thread A is not manipulating devices, that are also used by thread B.... All 'simple' solutions tend to work only for constrained environment.

Okay, that makes sense.

The reason I asked for libdevmapper to be thread-safe (even if not thread-efficient) is that its lack of thread-safety makes using it from many modern programming environments extremely clumsy, unsafe, or both. For instance, all programs using Go, glib, or Java must assume that multiple threads are running, and the Rust ecosystem expects that an API not marked as unsafe cannot cause undefined behavior, which includes data races.

So while we would like to have some more thread friendly version - it's currently well beyond are capacity...

And that is 100% valid.

@zkabelac
Copy link
Contributor

zkabelac commented Feb 6, 2022

Can you explain? I know that libdevmapper needs to call mlockall() or similar to avoid deadlocks. But I do not see how that prevents it from being thread-safe. What if there was an API that said, “I have already called mlockall() and will never unlock my RAM; do not worry about memory locking.”?

mlockall() is unfortunately not good way of using it - it can lock i.e. whole locale files - these have hundreds of MiB....

Been there, done that. Qubes OS calls cryptsetup with LC_ALL=C to avoid this.

And we've been used from i.e. Python apps - so there were even gigabytes of RAM locked into user-space - to solve it we would really need to use some from of 'daemon' to handle this DM table manipulation job.
Maybe one day - there will be some progress in this area...

@DemiMarie
Copy link
Author

And we've been used from i.e. Python apps - so there were even gigabytes of RAM locked into user-space - to solve it we would really need to use some from of 'daemon' to handle this DM table manipulation job.

Or a kernel change so that a sequence of DM ioctls can be done atomically.

Maybe one day - there will be some progress in this area...

I certainly hope so.

@zkabelac
Copy link
Contributor

zkabelac commented Feb 6, 2022

And we've been used from i.e. Python apps - so there were even gigabytes of RAM locked into user-space - to solve it we would really need to use some from of 'daemon' to handle this DM table manipulation job.

Or a kernel change so that a sequence of DM ioctls can be done atomically.

This would not likely help much.
We are suspending 'generic' device trees - so there is possibly A LOT of error paths to solve.

In general 'DM world' prefers to keep kernel 'simple' and leave all the complexity in users space.
Other projects/developers tend to put most of complicated code into kernel.

@DemiMarie
Copy link
Author

@zkabelac It’s worth noting that there are some cases where suspends are simply not an issue. If I write a volume manager for Qubes OS, it simply will not support managing the volume on which the root filesystem is located. So there is no need for memory locking.

In general 'DM world' prefers to keep kernel 'simple' and leave all the complexity in users space.
Other projects/developers tend to put most of complicated code into kernel.

The problem is that if your process dies (say due to the OOM killer) you have a completely hosed system and no way to recover except by reboot. Part of that can be solved by having Linux automatically resume a device if the process that suspended it dies, but still.

@zkabelac
Copy link
Contributor

@zkabelac It’s worth noting that there are some cases where suspends are simply not an issue. If I write a volume manager for Qubes OS, it simply will not support managing the volume on which the root filesystem is located. So there is no need for memory locking.

In general 'DM world' prefers to keep kernel 'simple' and leave all the complexity in users space.
Other projects/developers tend to put most of complicated code into kernel.

The problem is that if your process dies (say due to the OOM killer) you have a completely hosed system and no way to recover except by reboot. Part of that can be solved by having Linux automatically resume a device if the process that suspended it dies, but still.

We are solving withing lvm2 many system failure situation - against OOM we protect ourself by locking our code into RAM before we execute critical section of lvm2 code and we preallocate all needed RAM ahead of time.

However I'm very far away of saying we solved every possible cases - so surely there are possible highly unusual scenarios that may end as system deadlock.

But knowing from many years of support for lvm2 product - we are not aware users are facing such situation (at lost they do not report to us about them).

i.e. we know that some resume failure error paths are sometimes simply too hard to make them correct and reboot might be actually the best way how to deal with the problem....

@DemiMarie
Copy link
Author

i.e. we know that some resume failure error paths are sometimes simply too hard to make them correct and reboot might be actually the best way how to deal with the problem....

Obvious solution I can come up with is for the kernel to keep track of which process suspended a device, and for the kernel to automatically resume the device when that process dies. That would require some sort of locking, though.

@zkabelac
Copy link
Contributor

i.e. we know that some resume failure error paths are sometimes simply too hard to make them correct and reboot might be actually the best way how to deal with the problem....

Obvious solution I can come up with is for the kernel to keep track of which process suspended a device, and for the kernel to automatically resume the device when that process dies. That would require some sort of locking, though.

Which will not work - since we are suspending/resuming 'trees' of devices - each path has different error recovery sequence and further require adapting lvm metadata - and for that you need further locking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants