Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flake test of TestUpdateClusterEventHandler #5856

Conversation

XiShanYongYe-Chang
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang commented Nov 21, 2024

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

In the test, fake.NewFakeClient() and fakekarmadaclient.NewSimpleClientset() are two different fake clients. In other test cases without errors, controlPlaneClient is not really used because it is replaced by fakedynamic.NewSimpleDynamicClient. In failed test case, however, it is used to create the cluster object, which should be the cause of the occasional error.

Root cause: we added eventHandlers twice, which caused the test sequence to go wrong and kept on waiting for events.

The reason why we add the second commit: #5856 (comment)

Which issue(s) this PR fixes:
Fixes #5855

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@karmada-bot karmada-bot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Nov 21, 2024
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 21, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.37%. Comparing base (079d0ab) to head (7da6423).
Report is 4 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5856      +/-   ##
==========================================
+ Coverage   46.31%   46.37%   +0.06%     
==========================================
  Files         661      661              
  Lines       54364    54362       -2     
==========================================
+ Hits        25177    25209      +32     
+ Misses      27562    27533      -29     
+ Partials     1625     1620       -5     
Flag Coverage Δ
unittests 46.37% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@XiShanYongYe-Chang
Copy link
Member Author

/cc @mohamedawnallah @RainbowMango

@karmada-bot
Copy link
Collaborator

@XiShanYongYe-Chang: GitHub didn't allow me to request PR reviews from the following users: mohamedawnallah.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mohamedawnallah @RainbowMango

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mohamedawnallah
Copy link
Contributor

mohamedawnallah commented Nov 21, 2024

Thanks so much, @XiShanYongYe-Chang, for noticing that the controlPlaneClient builder wasn’t actually being used — it looks like I missed that! 🙏 I believe this is part of the reason why there are flaky tests. The other issue seems to be that we need to increase the timeout for the background threads to finish execution.

I've submitted a follow-up PR in your repo to address the timeout issue. To validate the changes, I ran all test cases in the controller 100 times, clearing the cache between runs, and all tests passed.

Note:
This PR XiShanYongYe-Chang#6 needs to be merged first for the additional commit to be integrated here.

@XiShanYongYe-Chang
Copy link
Member Author

XiShanYongYe-Chang commented Nov 22, 2024

The other issue seems to be that we need to increase the timeout for the background threads to finish execution.

Hi @mohamedawnallah, we may not need to wait, for the queue, it is blocking waiting until it fetches the element:

// Wait until there is a new item in the working queue
key, shutdown := c.queue.Get()

I'm a little confused here, why do we need to wait? Let's look at it again.

I'm guessing this wait time is probably the main reason, and I'm wondering how can we make sure it's enough?

@XiShanYongYe-Chang
Copy link
Member Author

/hold

@karmada-bot karmada-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024
@RainbowMango
Copy link
Member

/hold

Wait for XiShanYongYe-Chang#6 ?

@XiShanYongYe-Chang
Copy link
Member Author

I think I found the reason, because we added eventHandlers twice, which caused the test sequence to go wrong and kept on waiting for events.

@XiShanYongYe-Chang
Copy link
Member Author

/hold cancel
cc @mohamedawnallah

@karmada-bot karmada-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024
@mohamedawnallah
Copy link
Contributor

mohamedawnallah commented Nov 22, 2024

I think I found the reason, because we added eventHandlers twice, which caused the test sequence to go wrong and kept on waiting for events.

@XiShanYongYe-Chang Thank you! 🙏 That’s definitely the reason behind the timeout issue. In the referenced PR XiShanYongYe-Chang#6, I updated the following:

  1. Test Case Fix:
    Included the missing mock for clusterDynamicClientBuilder in the TestUpdateResourceRegistryEventHandler test case.

  2. Group/Version Updates:
    Added the correct group/version mappings:

    • apps/v1 for Deployments
    • v1 for Pods

These changes address error logs like:

... Failed to get gvr: no matches for kind "Deployment" in version "apps/v1"
... controller.go:563] Failed to get gvr: no matches for kind "Pod" in version "v1"

@XiShanYongYe-Chang
Copy link
Member Author

Thanks @mohamedawnallah, let me add your commit.

@karmada-bot karmada-bot added the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Nov 22, 2024
@karmada-bot karmada-bot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Nov 22, 2024
Signed-off-by: Mohamed Awnallah <[email protected]>
Signed-off-by: changzhen <[email protected]>
@XiShanYongYe-Chang
Copy link
Member Author

/cc @RainbowMango

@RainbowMango
Copy link
Member

/retest
The failing test is unrelated:

STEP: Deleting ResourceInterpreterCustomization(interpreter-customizationc7tph) @ 11/22/24 07:09:20.788
  << Timeline

  [FAILED] Unexpected error:
      <*fmt.wrapError | 0xc000694780>: 
      client rate limiter Wait returned an error: context deadline exceeded
      {
          msg: "client rate limiter Wait returned an error: context deadline exceeded",
          err: <context.deadlineExceededError>{},
      }
  occurred
  In [It] at: /home/runner/work/karmada/karmada/test/e2e/resourceinterpreter_test.go:538 @ 11/22/24 07:09:20.524

  Full Stack Trace
    github.com/karmada-io/karmada/test/e2e.init.func48.10.3.6()
    	/home/runner/work/karmada/karmada/test/e2e/resourceinterpreter_test.go:538 +0xb6
    github.com/karmada-io/karmada/test/e2e.init.func48.10.3()
    	/home/runner/work/karmada/karmada/test/e2e/resourceinterpreter_test.go:516 +0xc56

  There were additional failures detected.  To view them in detail run ginkgo -vv

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 22, 2024
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2024
@karmada-bot karmada-bot merged commit 8691287 into karmada-io:master Nov 22, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestUpdateClusterEventHandler run panic: timeout
5 participants