Fix race in libzfs_run_process_impl #16801

shodanshok · 2024-11-22T18:41:28Z

When replacing a disk, a child process is forked to run a script called zfs_prepare_disk (which can be useful for disk firmware update or health check). By default this script does nothing - it simply returns 0.

When testing on a virtual machine, it returns so fast that the parent misses it: when checking, the child already exited. As waitpid returns -1, the parent incorrectly assume that the child process had an error or was killed. This, in turn, leaves the newly added disk in REMOVED or UNAVAIL status rather than completing the replace process.

As child should be inspected via waitpid status flag and the relative macros, this patch remove the check around waitpid return code.

NOTE: the issue mostly affectszed autoreplacement, while plain zpool replace from command line seems fine.

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

When replacing a disk, a child process is forked to run a script called zfs_prepare_disk (which can be useful for disk firmware update or health check). By default this script does nothing - it simply returns 0. When testing on a virtual machine, it returns so fast that the parent misses it: when checking, the child already exited. As waitpid returns -1, the parent incorrectly assume that the child process had an error or was killed. This, in turn, leaves the newly added disk in REMOVED or UNAVAIL status rather than completing the replace process. As child should be inspected via waitpid status flag and the relative macros, this patch remove the check around waitpid return code. Signed-off-by: Gionatan Danti <[email protected]>

tonyhutter · 2024-11-26T19:13:01Z

I see the waitpid() man page example code (https://linux.die.net/man/2/waitpid) is a little different from the way we do things in libzfs_run_process_impl(). If we just adapt that code, does it fix the issue you're seeing?:

diff --git a/lib/libzfs/libzfs_util.c b/lib/libzfs/libzfs_util.c
index 1f7e7b0e6..951feb1a0 100644
--- a/lib/libzfs/libzfs_util.c
+++ b/lib/libzfs/libzfs_util.c
@@ -963,12 +963,14 @@ libzfs_run_process_impl(const char *path, char *argv[], char *env[], int flags,
        } else if (pid > 0) {
                /* Parent process */
                int status;
-
-               while ((error = waitpid(pid, &status, 0)) == -1 &&
-                   errno == EINTR)
-                       ;
-               if (error < 0 || !WIFEXITED(status))
-                       return (-1);
+               do {
+                       error = waitpid(pid, &status, WUNTRACED | WCONTINUED);
+                       if (error == -1)
+                               return (-1);
+                       if (WIFEXITED(status) || WIFSIGNALED(status) ||
+                           WIFSTOPPED(status) || WIFCONTINUED(status))
+                               return (-1);
+               } while (!WIFEXITED(status) && !WIFSIGNALED(status));
 
                if (lines != NULL) {
                        close(link[1]);

shodanshok · 2024-11-27T07:34:51Z

@tonyhutter I don't think it would improve the issue at hand.

error = waitpid(pid, &status, WUNTRACED | WCONTINUED);
if (error == -1)
        return (-1);

This code would return error if the child exited before the parent had a chance to check it - the same as current code. While this kind of check is correct for many cases (ie: when a child exiting so fast is not expected), for this specific operation (replacing a disk with an empty prepare script) it is not.

This is how I understand it, at least.
Thanks.

behlendorf requested a review from tonyhutter November 23, 2024 22:33

behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race in libzfs_run_process_impl #16801

Fix race in libzfs_run_process_impl #16801

shodanshok commented Nov 22, 2024 •

edited

Loading

tonyhutter commented Nov 26, 2024

shodanshok commented Nov 27, 2024

Fix race in libzfs_run_process_impl #16801

Are you sure you want to change the base?

Fix race in libzfs_run_process_impl #16801

Conversation

shodanshok commented Nov 22, 2024 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

tonyhutter commented Nov 26, 2024

shodanshok commented Nov 27, 2024

shodanshok commented Nov 22, 2024 •

edited

Loading