Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in libzfs_run_process_impl #16801

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shodanshok
Copy link
Contributor

@shodanshok shodanshok commented Nov 22, 2024

When replacing a disk, a child process is forked to run a script called zfs_prepare_disk (which can be useful for disk firmware update or health check). By default this script does nothing - it simply returns 0.

When testing on a virtual machine, it returns so fast that the parent misses it: when checking, the child already exited. As waitpid returns -1, the parent incorrectly assume that the child process had an error or was killed. This, in turn, leaves the newly added disk in REMOVED or UNAVAIL status rather than completing the replace process.

As child should be inspected via waitpid status flag and the relative macros, this patch remove the check around waitpid return code.

NOTE: the issue mostly affectszed autoreplacement, while plain zpool replace from command line seems fine.

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). By default this script does nothing - it simply returns 0.

When testing on a virtual machine, it returns so fast that the parent
misses it: when checking, the child already exited. As waitpid returns
-1, the parent incorrectly assume that the child process had an error
or was killed. This, in turn, leaves the newly added disk in REMOVED or
UNAVAIL status rather than completing the replace process.

As child should be inspected via waitpid status flag and the relative
macros, this patch remove the check around waitpid return code.

Signed-off-by: Gionatan Danti <[email protected]>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 23, 2024
@tonyhutter
Copy link
Contributor

I see the waitpid() man page example code (https://linux.die.net/man/2/waitpid) is a little different from the way we do things in libzfs_run_process_impl(). If we just adapt that code, does it fix the issue you're seeing?:

diff --git a/lib/libzfs/libzfs_util.c b/lib/libzfs/libzfs_util.c
index 1f7e7b0e6..951feb1a0 100644
--- a/lib/libzfs/libzfs_util.c
+++ b/lib/libzfs/libzfs_util.c
@@ -963,12 +963,14 @@ libzfs_run_process_impl(const char *path, char *argv[], char *env[], int flags,
        } else if (pid > 0) {
                /* Parent process */
                int status;
-
-               while ((error = waitpid(pid, &status, 0)) == -1 &&
-                   errno == EINTR)
-                       ;
-               if (error < 0 || !WIFEXITED(status))
-                       return (-1);
+               do {
+                       error = waitpid(pid, &status, WUNTRACED | WCONTINUED);
+                       if (error == -1)
+                               return (-1);
+                       if (WIFEXITED(status) || WIFSIGNALED(status) ||
+                           WIFSTOPPED(status) || WIFCONTINUED(status))
+                               return (-1);
+               } while (!WIFEXITED(status) && !WIFSIGNALED(status));
 
                if (lines != NULL) {
                        close(link[1]);

@shodanshok
Copy link
Contributor Author

@tonyhutter I don't think it would improve the issue at hand.

error = waitpid(pid, &status, WUNTRACED | WCONTINUED);
if (error == -1)
        return (-1);

This code would return error if the child exited before the parent had a chance to check it - the same as current code. While this kind of check is correct for many cases (ie: when a child exiting so fast is not expected), for this specific operation (replacing a disk with an empty prepare script) it is not.

This is how I understand it, at least.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants