Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win-sshproxy.tid created before thread id is available #433

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

lstocchi
Copy link
Collaborator

this commit fixes a potential race condition that prevented the tests to succeed when running in a github workflow.
Basically the thread id was not actually available before writing it on the file, resulting in a thread id equals to 0 written in it. So, when the tests were trying to retrieve the thread id to use it to send the WM_QUIT signal, they failed.

This patch adds a check on the thread id before writing it on the file. Now, if the thread id is 0, it keeps calling winquit to retrieve it. If, after 10 secs, there is no success it returns an error.

it resolves #432

Copy link
Collaborator

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at winquit code, calling NotifyOnQuit is supposed to guarantee that GetCurrentMessageLoopThreadId returns a non-0 value. However, the thread id is set when NotifyOnQuit calls messageLoop(), and this call is done in a go routine, so NotifyOnQuit can return before the go routine runs and inits the thread id.

Some comments/suggestions, but I'm fine with the PR as is if you prefer to keep it this way.

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you could reuse this helper

func retry[T comparable](ctx context.Context, retryFunc func() (T, error), retryMsg string) (T, error) {
var (
returnVal T
err error
)
backoff := initialBackoff
loop:
for i := 0; i < maxRetries; i++ {
select {
case <-ctx.Done():
break loop
default:
// proceed
}
returnVal, err = retryFunc()
if err == nil {
return returnVal, nil
}
logrus.Debugf("%s (%s)", retryMsg, backoff)
sleep(ctx, backoff)
backoff = backOff(backoff)
}
return returnVal, fmt.Errorf("timeout: %w", err)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I moved the Retry func in an utils package so it can be reused

@@ -173,11 +174,34 @@ func saveThreadId() (uint32, error) {
return 0, err
}
defer file.Close()
tid := winquit.GetCurrentMessageLoopThreadId()

tid, err := getThreadId()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add a slight delay during win-ssh-proxy startup, do you expect this delay to be problematic in typical use?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO no but I have a limited knowledge of its usage. Locally, and for the stuff I do, I didn't even notice.
Maybe it would be noticeable with low resources machine but better to slow it a bit at startup and be sure everything works fine, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect it won't be noticeable. However, if this was noticeable, this would have an impact on podman machine start startup time, which can be problematic.
Since the thread id is only needed when one wants to stop the podman machine VM, an alternative would be to try to do the waiting and writing of the thread id in a go routine to avoid the blocking.
However, podman would need to be ready for that, and retry reading the file if it's missing, which is not the case at the moment.

With all that said, the current approach should be good enough for now.

@cfergeau
Copy link
Collaborator

@n1hility fwiw, a small race in win-ssh-proxy/winquit.

this commit fixes a potential race condition that prevented the tests to succeed
when running in a github workflow.
Basically the thread id was not actually available before
writing it on the file, resulting in a thread id equals to 0 written in it.
So, when the tests were trying to retrieve the thread id to use it to send
the WM_QUIT signal, they failed.

This patch adds a check on the thread id before writing
it on the file. Now, if the thread id is 0, it keeps calling winquit to
retrieve it. If, after 10 secs, there is no success it returns an error.

Signed-off-by: lstocchi <[email protected]>
@cfergeau
Copy link
Collaborator

I've created containers/winquit#2 for the underlying winquit issue.

Copy link
Collaborator

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thank you so much for making CI green!

@@ -173,11 +174,34 @@ func saveThreadId() (uint32, error) {
return 0, err
}
defer file.Close()
tid := winquit.GetCurrentMessageLoopThreadId()

tid, err := getThreadId()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect it won't be noticeable. However, if this was noticeable, this would have an impact on podman machine start startup time, which can be problematic.
Since the thread id is only needed when one wants to stop the podman machine VM, an alternative would be to try to do the waiting and writing of the thread id in a go routine to avoid the blocking.
However, podman would need to be ready for that, and retry reading the file if it's missing, which is not the case at the moment.

With all that said, the current approach should be good enough for now.

Copy link
Contributor

openshift-ci bot commented Nov 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfergeau, lstocchi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit fe2d5d2 into containers:main Nov 29, 2024
20 checks passed
@lstocchi lstocchi deleted the i432 branch November 29, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix failing win-sshproxy tests
2 participants