Backport series for 7.6.1 #4222

walid-git · 2024-11-04T10:58:22Z

This is a set of backports in preparation for the 7.6.1 release.
Refs: #4195

doc/changes.rst

doc/sphinx/reference/vsm.rst

bin/varnishd/common/common_vsmw.c

walid-git · 2024-11-04T13:01:37Z

Seems like our fedora job is broken:

Unknown argument "groupinstall" for command "dnf5". Add "--help" for more information about the arguments.
It could be a command provided by a plugin, try: dnf5 install 'dnf5-command(groupinstall)'

nigoroll · 2024-11-04T13:15:04Z

Just so my apology has a place: I just pushed two commits to master which were meant to be just one. I just wanted to make it clear that we should not add anything else to VRT 20.1

Refs varnishcache#4134

There is a race between the time we connect to a backend and when we increase the connection count. This race is not normally observed, but when .max_connections property is used we could end up creating more connections to the backend than this limit permits. This is problematic as we end up consuming more system resources, e.g., file-descriptors, than reserved for such backends. The problem is that we release the backend lock and wait for the connect to finish. This can take upto connect_timeout, which has a default timeout of 3.5 seconds, and only then do we count the connection as consumed towards the .max_connections property. This can create a recipe for bursty behavior where a backend already struggling could be overwhelmed by a significant number of connections. Then, as the connection count goes higher than the ceiling we either fail fetches or queue them up. However, as soon as we once again drop below this limit we will again over-commit and the cycle repeats. The tests cases uses a blackhole backend to simulate the connect_timeout, this excersice the race but under normal circumstances the likelihood for the race to occur depend on the traffic (fetches), networking conditions, and the performance of the backend. The solution to the problem is to increase the connection count before connecting to the backend, and if we fail to get a connection to the backend we revert the count before failing the fetch. This will make any fethes past the .max_connections limit to either outright fail, or observe the presence of a queue to the backend. Signed-off-by: Asad Sajjad Ahmed <[email protected]>

Pondering solutions for varnishcache#4183 it became apparent that we need a way to notify directors of changes to the admin_health without crossing the VDI/VBE layering. Besides adding another director callback, one option is to add an event to the existing event facility, which so far was only used for VCL-related events. We now add VDI_EVENT_SICK as a director-specific event, which is cheap and also helps us maintain clean layering.

Using the new event, we can now selectively notify interested backend types (so far: VBE only) when the admin health changes. This fixes the layer violation introduced with e46f972, where a director private pointer was used with a VBE specific function. Fixes varnishcache#4183

Fixes varnishcache#4204

Try to raise the hard limit to infinity, then use as the soft (current) limit whatever the hard (max) limit is. Motivated by varnishcache#4193

Inform about current resource limits to help diagnosis. Example output: Child launched OK Info: Child (792279) said Child starts Info: Child (792279) said Warning: mlock() of VSM failed: Cannot allocate memory (12) Info: Child (792279) said Info: max locked memory (soft): 1048576 bytes Info: Child (792279) said Info: max locked memory (hard): 1048576 bytes Motivated by varnishcache#4193

Ref varnishcache#4193

Solaris does not have it and I overlooked the CONFORMING TO section in the Linux manpage. Ref varnishcache#4193

Not my day for details today, it seems. :(

@gquintard

It turns out varnishd can be the main process of a Docker container when another process like varnishlog is 'docker exec'uted in the same PID namespace. Spotted by @gquintard.

walid-git · 2024-11-04T17:08:44Z

Pushed 61f386d (thanks @dridi) to 7.6 branch and rebased this PR on top of it.

walid-git mentioned this pull request Nov 4, 2024

Collection of issues for a possible point release #4195

Open

nigoroll reviewed Nov 4, 2024

View reviewed changes

doc/changes.rst Outdated Show resolved Hide resolved

doc/sphinx/reference/vsm.rst Show resolved Hide resolved

bin/varnishd/common/common_vsmw.c Show resolved Hide resolved

walid-git force-pushed the backports_76 branch from f44867c to d2ecdbb Compare November 4, 2024 12:56

nigoroll added a commit that referenced this pull request Nov 4, 2024

sync vrt.h with #4222

778cda3

nigoroll added a commit that referenced this pull request Nov 4, 2024

sync vrt.h with #4222

dbea899

nigoroll approved these changes Nov 4, 2024

View reviewed changes

dridi and others added 14 commits November 4, 2024 18:06

vbe: Unconditionally clear the backend queue

5b028e2

Refs varnishcache#4134

Function rename suggested by Dridi

6d6e77e

waiter_epoll: Fix locking wrt nwaited counter

e2210a4

Fixes varnishcache#4204

mgt: Make an attempt to raise RLIMIT_MEMLOCK to infinity, set soft limit

b074c27

Try to raise the hard limit to infinity, then use as the soft (current) limit whatever the hard (max) limit is. Motivated by varnishcache#4193

doc: Try to help container / kubernetes users

a16595a

Ref varnishcache#4193

Use RLIMIT_MEMLOCK only where available

829576e

Solaris does not have it and I overlooked the CONFORMING TO section in the Linux manpage. Ref varnishcache#4193

doc polish: fix emphasis

6749650

doc polish: another nit snug in

5687845

nit: Use RLIMIT_MEMLOCK only where available

44ffeba

Not my day for details today, it seems. :(

vsm: Remove PID 1 expectation

571a731

It turns out varnishd can be the main process of a Docker container when another process like varnishlog is 'docker exec'uted in the same PID namespace. Spotted by @gquintard.

walid-git force-pushed the backports_76 branch from d2ecdbb to 571a731 Compare November 4, 2024 17:06

walid-git merged commit 4ef61c0 into varnishcache:7.6 Nov 7, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport series for 7.6.1 #4222

Backport series for 7.6.1 #4222

walid-git commented Nov 4, 2024

walid-git commented Nov 4, 2024

nigoroll commented Nov 4, 2024 •

edited

Loading

walid-git commented Nov 4, 2024

Backport series for 7.6.1 #4222

Backport series for 7.6.1 #4222

Conversation

walid-git commented Nov 4, 2024

walid-git commented Nov 4, 2024

nigoroll commented Nov 4, 2024 • edited Loading

walid-git commented Nov 4, 2024

nigoroll commented Nov 4, 2024 •

edited

Loading