systests: cp: add wait_for_ready #20912

edsantiago · 2023-12-05T17:50:42Z

Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: #20282 (I hope)

Signed-off-by: Ed Santiago [email protected]

None

Luap99

sounds like a logical explanation to me but I think you have overdone it a bit.

test/system/065-cp.bats

edsantiago · 2023-12-05T18:25:00Z

but I think you have overdone it a bit.

I half-agree. My first pass was addressing only the touch/mkdir containers. After some testing, and some thinking about it, I decided I never want to look at this flake again. I then applied wait_for_ready to every run -d. Is that harmful?

Luap99 · 2023-12-05T18:33:47Z

but I think you have overdone it a bit.

I half-agree. My first pass was addressing only the touch/mkdir containers. After some testing, and some thinking about it, I decided I never want to look at this flake again. I then applied wait_for_ready to every run -d. Is that harmful?

Harmful no, but it makes the diff here bigger than it needs to be and makes the tests slower as they now always call podman logs even when it is not needed.

edsantiago · 2023-12-05T18:46:47Z

OK. I'll repush once CI finishes.

Some of the tests were doing "podman run -d" without wait_for_ready. This may be the cause of some of the CI flakes. Maybe even all? It's not clear why the tests have been working reliably for years under overlay, and only started failing under vfs, but shrug. Thanks to Chris for making that astute observation. Fixes: containers#20282 (I hope) Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2023-12-05T19:02:46Z

Done. Now wait_for_ready is added only to those containers that touch, echo, or mkdir .

Luap99

LGTM

openshift-ci · 2023-12-06T10:53:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,edsantiago]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2023-12-06T15:35:11Z

/lgtm

cevich · 2023-12-06T15:56:53Z

Thanks for fixing this Ed, hopefully it was the cause.

It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug

If it helps, and this is a total guess. My feeling is the failure unpredictability is coming from the storage subsystem in the cloud context. All the CI VMs are running with (presumably multi-path) fiber-channel/network based storage. That in and of itself adds in a HUGE amount of complexity w/in the kernel and hardware-wise. Worse, both bandwidth and IOPS are "provisioned" (i.e. limited) based on what you pay for. Either/both of those aspects could easily result in randomly appearing "hiccups" in user-space. In other words, we should expect both the cloud "throttling" reads and/or writes, and occasional (transparent) hiccups w/in the hardware or network "fabric" itself.

openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 5, 2023

Luap99 requested changes Dec 5, 2023

View reviewed changes

edsantiago force-pushed the fix_some_cp_flakes branch from 56bde48 to 18a268f Compare December 5, 2023 18:57

edsantiago force-pushed the fix_some_cp_flakes branch from 18a268f to 4d2125b Compare December 5, 2023 18:58

Luap99 approved these changes Dec 6, 2023

View reviewed changes

openshift-ci bot assigned rhatdan Dec 6, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2023

openshift-merge-bot bot merged commit a64cc98 into containers:main Dec 6, 2023
91 of 93 checks passed

edsantiago deleted the fix_some_cp_flakes branch December 6, 2023 16:24

edsantiago mentioned this pull request Jan 10, 2024

[v4.8] systests: cp: add wait_for_ready #21220

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 6, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systests: cp: add wait_for_ready #20912

systests: cp: add wait_for_ready #20912

edsantiago commented Dec 5, 2023

Luap99 left a comment

edsantiago commented Dec 5, 2023

Luap99 commented Dec 5, 2023

edsantiago commented Dec 5, 2023

edsantiago commented Dec 5, 2023

Luap99 left a comment

openshift-ci bot commented Dec 6, 2023

rhatdan commented Dec 6, 2023

cevich commented Dec 6, 2023

systests: cp: add wait_for_ready #20912

systests: cp: add wait_for_ready #20912

Conversation

edsantiago commented Dec 5, 2023

Luap99 left a comment

Choose a reason for hiding this comment

edsantiago commented Dec 5, 2023

Luap99 commented Dec 5, 2023

edsantiago commented Dec 5, 2023

edsantiago commented Dec 5, 2023

Luap99 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Dec 6, 2023

rhatdan commented Dec 6, 2023

cevich commented Dec 6, 2023