CI: misc parallel flakes #23479

edsantiago · 2024-08-01T17:09:46Z

Hodgepodge of parallel-system-test flakes that don't seem to fit anywhere else. I think most of these just need something like

    # Expectation (in seconds) of when we should time out. When running
    # parallel, allow 2 more seconds due to system load
    local expect=4
    if [[ -n "$PARALLEL_JOBSLOT" ]]; then
        expect=$((expect + 2))
    fi
    assert $delta_t -le $expect \
           "podman kube play did not get killed within $expect seconds"

debian-13 : sys podman debian-13 root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 07-31 17:58 in [sys] |035| podman logs - --until --follow journald
  - 07-23 14:36 in [sys] [035] podman logs - multi k8s-file
fedora-39 : sys podman fedora-39 rootless host boltdb
- PR WIP: system test parallelization: two-pass approach #23275
  - 07-24 07:18 in [sys] [035] podman logs - --since --follow journald
  - 07-22 07:38 in [sys] [035] podman logs - --until --follow journald
rawhide : sys podman rawhide root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 07-23 17:27 in [sys] [035] podman logs - --until --follow journald
rawhide : sys podman rawhide rootless host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 07-31 17:55 in [sys] |035| podman logs - --until --follow journald

x	x	x	x	x	x
sys(6)	podman(6)	rawhide(2)	root(3)	host(6)	sqlite(4)
		debian-13(2)	rootless(3)		boltdb(2)
		fedora-39(2)

Honny1 · 2024-08-12T07:24:55Z

Hi @edsantiago,
I think increasing the expected time in |035| podman logs - --until --follow journald is not a good idea. Since the time can vary with the actual load on the machine or vary due to the scheduler when running lots of parallel runs, the test should check if the command gets 3s of logs, not how long it takes.

For the [035] podman logs - multi k8s-file test, I would say that the first container did not finish the job and was put to sleep due to CI machine load. Probably should wait for both containers to finish their work before reading the logs.

In the test [035] podman logs - --since --follow journald I would say that when running in parallel the journald is used by multiple containers, so it will be necessary to increase the timeout time to give the container more time to write to the journald, and also perform a check, end of journald content.

Luap99 · 2024-08-12T15:42:03Z

I think increasing the expected time in |035| podman logs - --until --follow journald is not a good idea. Since the time can vary with the actual load on the machine or vary due to the scheduler when running lots of parallel runs, the test should check if the command gets 3s of logs, not how long it takes.

Keep in mind the same race exists for the ctr process so there is no way of knowing what 3s of logs are because depending on scheduling the ctr process might have only written a few lines not 30 with the sleep 0.1 interval so it is impossible to know if the writer side didn't write fast enough or if the reader looses messages. As such the process should exit after 3s match is simple and easy in theory but of course also has the timing problem. And we also want to check the the logs process actually exits in time.

I am however not sure how the rounding works with the built in $SECONDS in bash, maybe it would be safer to take the time before and after in ms and compare that?

Honny1 · 2024-08-13T15:31:15Z

@Luap99 I tested $SECONDS and time in ms. I found that $SECONDS are not accurate because the time is rounded to whole seconds. So if the t0 is 1856ms, but the $SECONDS is still 1, this inaccuracy causes the command to be at most only about 150ms late which is less variation than I observed between test runs (the time was around 3150-3650ms). At higher workloads, this delay can be larger.

github-actions · 2024-09-13T00:08:20Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago added the flakes Flakes from Continuous Integration label Aug 1, 2024

Honny1 assigned Honny1 and unassigned Honny1 Aug 8, 2024

edsantiago mentioned this issue Aug 14, 2024

CI: podman logs k8s-file never seeing output #23615

Open

Honny1 mentioned this issue Aug 15, 2024

Fix CI: misc parallel flakes #23600

Closed

edsantiago mentioned this issue Aug 20, 2024

CI: parallel system tests: logs --until --follow: exited too late #23682

Open

github-actions bot added the stale-issue label Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: misc parallel flakes #23479

CI: misc parallel flakes #23479

edsantiago commented Aug 1, 2024

Honny1 commented Aug 12, 2024

Luap99 commented Aug 12, 2024

Honny1 commented Aug 13, 2024

github-actions bot commented Sep 13, 2024

CI: misc parallel flakes #23479

CI: misc parallel flakes #23479

Comments

edsantiago commented Aug 1, 2024

Honny1 commented Aug 12, 2024

Luap99 commented Aug 12, 2024

Honny1 commented Aug 13, 2024

github-actions bot commented Sep 13, 2024