Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systests: cp: add wait_for_ready #20912

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

edsantiago
Copy link
Member

Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: #20282 (I hope)

Signed-off-by: Ed Santiago [email protected]

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 5, 2023
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a logical explanation to me but I think you have overdone it a bit.

test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
test/system/065-cp.bats Outdated Show resolved Hide resolved
@edsantiago
Copy link
Member Author

but I think you have overdone it a bit.

I half-agree. My first pass was addressing only the touch/mkdir containers. After some testing, and some thinking about it, I decided I never want to look at this flake again. I then applied wait_for_ready to every run -d. Is that harmful?

@Luap99
Copy link
Member

Luap99 commented Dec 5, 2023

but I think you have overdone it a bit.

I half-agree. My first pass was addressing only the touch/mkdir containers. After some testing, and some thinking about it, I decided I never want to look at this flake again. I then applied wait_for_ready to every run -d. Is that harmful?

Harmful no, but it makes the diff here bigger than it needs to be and makes the tests slower as they now always call podman logs even when it is not needed.

@edsantiago
Copy link
Member Author

OK. I'll repush once CI finishes.

Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago
Copy link
Member Author

Done. Now wait_for_ready is added only to those containers that touch, echo, or mkdir .

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Dec 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Dec 6, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit a64cc98 into containers:main Dec 6, 2023
91 of 93 checks passed
@cevich
Copy link
Member

cevich commented Dec 6, 2023

Thanks for fixing this Ed, hopefully it was the cause.

It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug

If it helps, and this is a total guess. My feeling is the failure unpredictability is coming from the storage subsystem in the cloud context. All the CI VMs are running with (presumably multi-path) fiber-channel/network based storage. That in and of itself adds in a HUGE amount of complexity w/in the kernel and hardware-wise. Worse, both bandwidth and IOPS are "provisioned" (i.e. limited) based on what you pay for. Either/both of those aspects could easily result in randomly appearing "hiccups" in user-space. In other words, we should expect both the cloud "throttling" reads and/or writes, and occasional (transparent) hiccups w/in the hardware or network "fabric" itself.

@edsantiago edsantiago deleted the fix_some_cp_flakes branch December 6, 2023 16:24
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 6, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman cp under vfs: ENOENT
4 participants