Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: misc parallel flakes #23479

Open
edsantiago opened this issue Aug 1, 2024 · 4 comments
Open

CI: misc parallel flakes #23479

edsantiago opened this issue Aug 1, 2024 · 4 comments
Labels
flakes Flakes from Continuous Integration stale-issue

Comments

@edsantiago
Copy link
Member

Hodgepodge of parallel-system-test flakes that don't seem to fit anywhere else. I think most of these just need something like

    # Expectation (in seconds) of when we should time out. When running
    # parallel, allow 2 more seconds due to system load
    local expect=4
    if [[ -n "$PARALLEL_JOBSLOT" ]]; then
        expect=$((expect + 2))
    fi
    assert $delta_t -le $expect \
           "podman kube play did not get killed within $expect seconds"
x x x x x x
sys(6) podman(6) rawhide(2) root(3) host(6) sqlite(4)
debian-13(2) rootless(3) boltdb(2)
fedora-39(2)
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Aug 1, 2024
@Honny1 Honny1 assigned Honny1 and unassigned Honny1 Aug 8, 2024
@Honny1
Copy link
Member

Honny1 commented Aug 12, 2024

Hi @edsantiago,
I think increasing the expected time in |035| podman logs - --until --follow journald is not a good idea. Since the time can vary with the actual load on the machine or vary due to the scheduler when running lots of parallel runs, the test should check if the command gets 3s of logs, not how long it takes.

For the [035] podman logs - multi k8s-file test, I would say that the first container did not finish the job and was put to sleep due to CI machine load. Probably should wait for both containers to finish their work before reading the logs.

In the test [035] podman logs - --since --follow journald I would say that when running in parallel the journald is used by multiple containers, so it will be necessary to increase the timeout time to give the container more time to write to the journald, and also perform a check, end of journald content.

@Luap99
Copy link
Member

Luap99 commented Aug 12, 2024

I think increasing the expected time in |035| podman logs - --until --follow journald is not a good idea. Since the time can vary with the actual load on the machine or vary due to the scheduler when running lots of parallel runs, the test should check if the command gets 3s of logs, not how long it takes.

Keep in mind the same race exists for the ctr process so there is no way of knowing what 3s of logs are because depending on scheduling the ctr process might have only written a few lines not 30 with the sleep 0.1 interval so it is impossible to know if the writer side didn't write fast enough or if the reader looses messages. As such the process should exit after 3s match is simple and easy in theory but of course also has the timing problem. And we also want to check the the logs process actually exits in time.

I am however not sure how the rounding works with the built in $SECONDS in bash, maybe it would be safer to take the time before and after in ms and compare that?

@Honny1
Copy link
Member

Honny1 commented Aug 13, 2024

@Luap99 I tested $SECONDS and time in ms. I found that $SECONDS are not accurate because the time is rounded to whole seconds. So if the t0 is 1856ms, but the $SECONDS is still 1, this inaccuracy causes the command to be at most only about 150ms late which is less variation than I observed between test runs (the time was around 3150-3650ms). At higher workloads, this delay can be larger.

Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flakes Flakes from Continuous Integration stale-issue
Projects
None yet
Development

No branches or pull requests

3 participants