-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
placeholder issue for quay.io flakes #16973
Comments
I've got my bug report written but not yet sent to quay. Before I send... I reviewed flake logs, and see this one:
...which is close enough to the |
Could this be something where we need better retries. Although this looks like name resolution is failing. |
I noticed this morning, and reported on my (stalled) quay ticket, that |
Last five days. This is no longer our top flake (the "happened during" one, #17193, is now top position) but it's still a problem. I'm wondering if they're actually the same issue, presenting in different ways.
|
Here's the response from quay support:
Obviously, the firewall is fine, because this happens only with |
Found instances in Fedora CI. January 5:
Also this one and this one, all of them January 5, all of them Fedora gating tests. All errors look similar, so I won't paste the full headers; if they're important to know, someone please grab them quick because Fedora CI logs don't last long. That tentatively rules out a problem local to our CI setup. |
I do not see this flake in RHEL gating tests. I've scoured logs of the last 50 runs, both automatedly via script and manually (to double-check my script), and see only one place where Given the frequency with which this issue triggers on Fedora, I find it very curious not to find it in RHEL. So curious, that I spun up a VM to look at something..... # grep ^hosts /etc/nsswitch.conf
hosts: files dns myhostname |
The problem persists (see below)... but only in old branches and in the PR traffic has been very low the last few days, so this is not a significant result... but it's an interesting one.
|
Followup to my postscrum this morning: the non-test flake above happened in As a reminder, the point here is not to disable |
No new |
A friendly reminder that this issue had no activity for 30 days. |
@edsantiago any update? |
Disabling systemd-resolve (#17505) REALLY helped. We still see the flakes in CI steps that run before my disable, so I have containers/automation_images#269 in progress to update CI VMs. That won't help old branches, nor will it help the other quay flakes. But still, this is the flake list from the last 25 days:
We were getting multiple flakes per day. This is (not counting the v4.x ones) five in almost a month. |
@edsantiago we've been seeing some quay flakes in FCOS (which also uses One thing that is interesting to me is that we only see the flakes on our aarch64 machine which happens to sit in AWS (the other machines don't). By chance does your CI run in AWS? |
@edsantiago Have you had a chance to reach out to systemd maintainers about systemd-resolved being the source of flakes in your CI? |
@jlebon I haven't tried, and am unlikely to. |
Just for the record, we don't install resolved by default on RHEL. We even don't officially support it, it is marked as a technical preview. |
FWIW, just did a google search for this problem and came up with an interesting hit. Not sure what to make of that other than it's maybe a problem somebody has experienced in OpenShift land. |
I'm attempting to remove @edsantiago workaround in #19541 where we can run CI a few times to see if it's still flaking. |
Closing. The |
This is a placeholder only. I'm seeing a lot of quay.io flakes. We (containers team) can't do anything about those, but the way my flake logger works, the individual failures are in skillions of different tests, which makes it hard for my brain to grok. Creating a single issue, and assigning them all here, makes it much easier to track (and hence to ignore).
The most common symptom is:(EDIT: removed, this turned out to be a different bug)Symptoms are:
and
and
The text was updated successfully, but these errors were encountered: