CI: systemd int tests: systemctl stop, times out #23480

edsantiago · 2024-08-01T17:31:39Z

One-off (so far) in debian root:

Podman systemd
           [It] podman start container by systemd
...
           Execing /usr/bin/journalctl --system -q -n 20 -u redis-mvMbzQQFRN
          [...ok...]
           Execing /usr/bin/systemctl --system status redis-mvMbzQQFRN
         [...ok...]
           Execing /usr/bin/systemctl --system stop redis-mvMbzQQFRN
         
           [FAILED] Timed out after 90.001s.

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-08-16T14:27:42Z

We have a ton of systemd/quadlet based test in system tests so would anyone object to me nuking the test from e2e?
I don't see it testing anything useful really IMO.

rhatdan · 2024-08-16T15:07:04Z

I would be fine with that.

edsantiago · 2024-08-19T10:49:29Z

Before we sweep this under the rug... this has triggered two more times (see below). Looking at journal on the latest one, I see


Aug 11 15:59:18 cirrus-task-5040669121576960 systemd[1]: Started libpod-c08b96745cf60e9ccd9c92ab2206e0f926e67cde9d9b91c82d4aa4ed35df7c4a.scope - libcrun container.
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:C 11 Aug 2024 15:59:18.839 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:C 11 Aug 2024 15:59:18.839 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:C 11 Aug 2024 15:59:18.843 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:C 11 Aug 2024 15:59:18.843 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 * Running mode=standalone, port=6379.
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 # Server initialized
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 * Ready to accept connections
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:C 11 Aug 2024 15:59:18.843 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:C 11 Aug 2024 15:59:18.843 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 * Running mode=standalone, port=6379.
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 # Server initialized
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').

Could that be related? And since this is a debian-only bug, WDYT of just skipping the test on debian?

debian-13 : int podman debian-13 root host sqlite
- 08-11 12:07 in Podman systemd podman start container by systemd
- 08-06 11:34 in Podman systemd podman start container by systemd
- 08-01 12:48 in Podman systemd podman start container by systemd

x	x	x	x	x	x
int(3)	podman(3)	debian-13(3)	root(3)	host(3)	sqlite(3)

Luap99 · 2024-08-19T11:07:49Z

I don't think it is related to the errors there. Looking at the test output I see

           ● redis-cddvELgoAr.service - redis container
                Loaded: loaded (/run/systemd/system/redis-cddvELgoAr.service; disabled; preset: enabled)
                Active: active (running) since Sun 2024-08-11 15:59:18 UTC; 70ms ago
            Invocation: 0ca2e9dc8c5243aaa0076a3cc5efbbec
              Main PID: 99696 (16)
                 Tasks: 1 (limit: 4678)
                Memory: 236K (peak: 368K)
                   CPU: 782us
                CGroup: /system.slice/redis-cddvELgoAr.service
                        └─99696 /usr/lib/systemd/systemd-executor --deserialize 55 --log-level info --log-target journal-or-kmsg

systemd-executer is the main pid of the unit??? If I read systemd docs correctly this might be normal as it means it has not yet executed our podman process.

Looking at a passing test this is makes more sense

           ● redis-kJwRLOTCKZ.service - redis container
                Loaded: loaded (/run/systemd/system/redis-kJwRLOTCKZ.service; disabled; preset: enabled)
                Active: active (running) since Mon 2024-08-19 10:08:31 UTC; 80ms ago
            Invocation: b5ec0276f34d444da435a4314e0ee599
              Main PID: 169449 (podman)
                 Tasks: 8 (limit: 4678)
                Memory: 10.6M (peak: 10.7M)
                   CPU: 46ms
                CGroup: /system.slice/redis-kJwRLOTCKZ.service
                        ├─169449 /var/tmp/go/src/github.com/containers/podman/bin/podman --storage-opt overlay.imagestore=/tmp/CI_lQRY/podman-e2e-926152866/imagecachedir --root /tmp/CI_lQRY/podman-e2e-926152866/subtest-2091879897/root --runroot /tmp/CI_lQRY/podman-e2e-926152866/subtest-2091879897/runroot --runtime crun --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/CI_lQRY/podman-e2e-926152866/subtest-2091879897 --events-backend file --db-backend sqlite --storage-driver overlay start -a redis
                        ├─169484 /usr/lib/podman/netavark --config /run/containers/networks --rootless=false --aardvark-binary=/usr/lib/podman/aardvark-dns setup /run/netns/netns-ea33973e-410c-78cf-607b-4cd46c58f1c4
                        └─169486 iptables --version

I think it is possible that stop is called to early where it has no effect (or SIGTERM was just ignored) so the unit just keeps running until systemd goes into its 90s default timeout and as we also wait 90s we timeout before system had the time to kill all processes and return and given the units uses KillMode=process it is even possible that we leak processes there.

Regardless these kind of systemd units are wrong (unsupported by us), they do not use our wanted cgroup settings such as cgroups=split, systemd notify, etc... As such I see no point in testing this here.

edsantiago added the flakes Flakes from Continuous Integration label Aug 1, 2024

Luap99 self-assigned this Aug 16, 2024

Luap99 mentioned this issue Aug 16, 2024

test/e2e: rm systemd start test #23650

Merged

openshift-merge-bot bot closed this as completed in #23650 Aug 19, 2024

openshift-merge-bot bot closed this as completed in 57016f5 Aug 19, 2024

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 18, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: systemd int tests: systemctl stop, times out #23480

CI: systemd int tests: systemctl stop, times out #23480

edsantiago commented Aug 1, 2024

Luap99 commented Aug 16, 2024

rhatdan commented Aug 16, 2024

edsantiago commented Aug 19, 2024

Luap99 commented Aug 19, 2024

CI: systemd int tests: systemctl stop, times out #23480

CI: systemd int tests: systemctl stop, times out #23480

Comments

edsantiago commented Aug 1, 2024

Luap99 commented Aug 16, 2024

rhatdan commented Aug 16, 2024

edsantiago commented Aug 19, 2024

Luap99 commented Aug 19, 2024