Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: systemd int tests: systemctl stop, times out #23480

Closed
edsantiago opened this issue Aug 1, 2024 · 4 comments · Fixed by #23650
Closed

CI: systemd int tests: systemctl stop, times out #23480

edsantiago opened this issue Aug 1, 2024 · 4 comments · Fixed by #23650
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

One-off (so far) in debian root:

Podman systemd
           [It] podman start container by systemd
...
           Execing /usr/bin/journalctl --system -q -n 20 -u redis-mvMbzQQFRN
          [...ok...]
           Execing /usr/bin/systemctl --system status redis-mvMbzQQFRN
         [...ok...]
           Execing /usr/bin/systemctl --system stop redis-mvMbzQQFRN
         
           [FAILED] Timed out after 90.001s.
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Aug 1, 2024
@Luap99
Copy link
Member

Luap99 commented Aug 16, 2024

We have a ton of systemd/quadlet based test in system tests so would anyone object to me nuking the test from e2e?
I don't see it testing anything useful really IMO.

@rhatdan
Copy link
Member

rhatdan commented Aug 16, 2024

I would be fine with that.

@edsantiago
Copy link
Member Author

Before we sweep this under the rug... this has triggered two more times (see below). Looking at journal on the latest one, I see


Aug 11 15:59:18 cirrus-task-5040669121576960 systemd[1]: Started libpod-c08b96745cf60e9ccd9c92ab2206e0f926e67cde9d9b91c82d4aa4ed35df7c4a.scope - libcrun container.
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:C 11 Aug 2024 15:59:18.839 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:C 11 Aug 2024 15:59:18.839 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:C 11 Aug 2024 15:59:18.843 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:C 11 Aug 2024 15:59:18.843 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 * Running mode=standalone, port=6379.
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 # Server initialized
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
Aug 11 15:59:18 cirrus-task-5040669121576960 redis[99868]: 1:M 11 Aug 2024 15:59:18.845 * Ready to accept connections
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:C 11 Aug 2024 15:59:18.843 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:C 11 Aug 2024 15:59:18.843 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 * Running mode=standalone, port=6379.
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 # Server initialized
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Aug 11 15:59:18 cirrus-task-5040669121576960 podman[99696]: 1:M 11 Aug 2024 15:59:18.845 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').

Could that be related? And since this is a debian-only bug, WDYT of just skipping the test on debian?

  • debian-13 : int podman debian-13 root host sqlite
    • 08-11 12:07 in Podman systemd podman start container by systemd
    • 08-06 11:34 in Podman systemd podman start container by systemd
    • 08-01 12:48 in Podman systemd podman start container by systemd
x x x x x x
int(3) podman(3) debian-13(3) root(3) host(3) sqlite(3)

@Luap99
Copy link
Member

Luap99 commented Aug 19, 2024

I don't think it is related to the errors there. Looking at the test output I see

           ● redis-cddvELgoAr.service - redis container
                Loaded: loaded (/run/systemd/system/redis-cddvELgoAr.service; disabled; preset: enabled)
                Active: active (running) since Sun 2024-08-11 15:59:18 UTC; 70ms ago
            Invocation: 0ca2e9dc8c5243aaa0076a3cc5efbbec
              Main PID: 99696 (16)
                 Tasks: 1 (limit: 4678)
                Memory: 236K (peak: 368K)
                   CPU: 782us
                CGroup: /system.slice/redis-cddvELgoAr.service
                        └─99696 /usr/lib/systemd/systemd-executor --deserialize 55 --log-level info --log-target journal-or-kmsg

systemd-executer is the main pid of the unit??? If I read systemd docs correctly this might be normal as it means it has not yet executed our podman process.

Looking at a passing test this is makes more sense

           ● redis-kJwRLOTCKZ.service - redis container
                Loaded: loaded (/run/systemd/system/redis-kJwRLOTCKZ.service; disabled; preset: enabled)
                Active: active (running) since Mon 2024-08-19 10:08:31 UTC; 80ms ago
            Invocation: b5ec0276f34d444da435a4314e0ee599
              Main PID: 169449 (podman)
                 Tasks: 8 (limit: 4678)
                Memory: 10.6M (peak: 10.7M)
                   CPU: 46ms
                CGroup: /system.slice/redis-kJwRLOTCKZ.service
                        ├─169449 /var/tmp/go/src/github.com/containers/podman/bin/podman --storage-opt overlay.imagestore=/tmp/CI_lQRY/podman-e2e-926152866/imagecachedir --root /tmp/CI_lQRY/podman-e2e-926152866/subtest-2091879897/root --runroot /tmp/CI_lQRY/podman-e2e-926152866/subtest-2091879897/runroot --runtime crun --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/CI_lQRY/podman-e2e-926152866/subtest-2091879897 --events-backend file --db-backend sqlite --storage-driver overlay start -a redis
                        ├─169484 /usr/lib/podman/netavark --config /run/containers/networks --rootless=false --aardvark-binary=/usr/lib/podman/aardvark-dns setup /run/netns/netns-ea33973e-410c-78cf-607b-4cd46c58f1c4
                        └─169486 iptables --version

I think it is possible that stop is called to early where it has no effect (or SIGTERM was just ignored) so the unit just keeps running until systemd goes into its 90s default timeout and as we also wait 90s we timeout before system had the time to kill all processes and return and given the units uses KillMode=process it is even possible that we leak processes there.

Regardless these kind of systemd units are wrong (unsupported by us), they do not use our wanted cgroup settings such as cgroups=split, systemd notify, etc... As such I see no point in testing this here.

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 18, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Nov 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants