-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingress: Add dynamic-config-manager enhancement #1687
base: master
Are you sure you want to change the base?
ingress: Add dynamic-config-manager enhancement #1687
Conversation
* enhancements/ingress/dynamic-config-manager.md: New file.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Miciah: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
override: | ||
|
||
```shell | ||
oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"unsupportedConfigOverrides":{"dynamicConfigManager":"false"}}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For upgrades, I assume you need to remove the unsupported override and re-enable it once the upgrade is complete. However, this raises the question: if DCM is re-enabled, does that imply the cluster may be broken from an ingress perspective until the update is complete and DCM can be disabled again? Is that going to be an acceptable stance for production clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For upgrades, I assume you need to remove the unsupported override and re-enable it once the upgrade is complete.
We should, but currently do not, block upgrades when using unsupportedConfigOverrides
(reported as https://issues.redhat.com/browse/OCPBUGS-41366).
However, this raises the question: if DCM is re-enabled, does that imply the cluster may be broken from an ingress perspective until the update is complete and DCM can be disabled again? Is that going to be an acceptable stance for production clusters?
I guess you're saying that unsupportedConfigOverrides
isn't a good escape hatch after all, right? You can probably force an upgrade using oc adm --force
, but I understand that --force
is a last resort.
I suppose we could tell people to turn off the featuregate. Is that a reasonable escape hatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could tell people to turn off the featuregate. Is that a reasonable escape hatch?
Yes, I would say so.
/assign @alebedev87 |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
OpenShift 4.18 enables Dynamic Config Manager with 1 pre-allocated server per | ||
backend, without blueprint routes, and without any configuration options. The | ||
goal is to deliver a minimum viable product on which to iterate. This MVP | ||
provides marginal value by avoiding reloading HAProxy for a single scale-out | ||
event or subsequent scale-in event for a route, at minimal development and | ||
operational cost. More importantly, the MVP gives us CI signal, enables us to | ||
work out defects in DCM, and gives us a starting point from which to enhance DCM | ||
in subsequent OpenShift releases. In the future, we intend to extend DCM with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Says the same thing, just reworded for clarity.
OpenShift 4.18 enables Dynamic Config Manager with 1 pre-allocated server per | |
backend, without blueprint routes, and without any configuration options. The | |
goal is to deliver a minimum viable product on which to iterate. This MVP | |
provides marginal value by avoiding reloading HAProxy for a single scale-out | |
event or subsequent scale-in event for a route, at minimal development and | |
operational cost. More importantly, the MVP gives us CI signal, enables us to | |
work out defects in DCM, and gives us a starting point from which to enhance DCM | |
in subsequent OpenShift releases. In the future, we intend to extend DCM with | |
OpenShift 4.18 delivers a Tech Preview version of Dynamic Config Manager as a minimum viable product that enables dynamic configuration without repeated HAProxy reloads. It will be offered with 1 pre-allocated server per backend, without blueprint routes, and without any configuration options. This MVP | |
provides marginal value by avoiding reloading HAProxy for a single scale-out | |
event or subsequent scale-in event for a route, at minimal development and | |
operational cost. More importantly, the MVP gives us CI signal, enables us to | |
work out defects in DCM, and gives us a starting point from which to enhance DCM | |
in subsequent OpenShift releases. In the future, we intend to extend DCM with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this isn't a special version of DCM; it's a specific configuration of DCM. However, I can add mention that we are shipping this as Tech Preview in 4.18.
HAProxy process through a Unix domain socket. This means that no forking is | ||
necessary to update configuration and effectuate the changes. | ||
|
||
However, DCM requires some work before it can be enabled. DCM was first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, DCM requires some work before it can be enabled. DCM was first | |
DCM requires substantial work to develop and verify it in order to make it | |
viable. DCM was first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had to fix several defects in DCM itself before we were comfortable shipping it even in this MVP configuration, and even as a Tech Preview feature. I don't know whether it is considered "viable" at this point, or whether we would consider it "viable" if further testing reveals no new defects. How about this?
However, DCM requires some work before it can be enabled. DCM was first | |
However, DCM requires development work to fix several known issues as well as extensive testing in order to ensure that enabling it does not introduce regressions. DCM was first |
does not take full advantage of the capabilities of newer HAProxy versions. In | ||
sum, DCM requires substantial work to develop and verify it in order to make it | ||
viable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the last sentence if you change the first sentence.
does not take full advantage of the capabilities of newer HAProxy versions. In | |
sum, DCM requires substantial work to develop and verify it in order to make it | |
viable. | |
does not take full advantage of the capabilities of newer HAProxy versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to follow the usual topic sentence, body, conclusion. Maybe it's a little too College English 101? I can tighten the paragraph up if it's too wordy.
The degree to which DCM mitigates this issue is dependent on the nature of the | ||
configuration changes: Some changes still require a configuration reload, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The degree to which DCM mitigates this issue is dependent on the nature of the | |
configuration changes: Some changes still require a configuration reload, but | |
The degree to which DCM mitigates this issue is dependent on the nature of the | |
configuration changes. Some changes will still require a configuration reload, but |
_As a project administrator, I want HAProxy to balance traffic evenly over old | ||
and new pods when I scale my application up._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ❤️ the explanations in this section.
Is this also true?
_As a project administrator, I want HAProxy to balance traffic evenly over old | |
and new pods when I scale my application up._ | |
_As a project administrator, I want HAProxy to balance traffic correctly over old | |
and new pods when I scale my application up or down._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would say it's still true. I prefer "evenly" because it is more evocative of the problem we are trying to avoid: an excessive share of the traffic going to the first server in the backend. However, I can write "correctly" if that is clearer. Adding "or down" does make sense. (I should probably make it "out or in" since we are talking about adding or removing pod replicas, not increasing or decreasing the resources available to each pod.)
|
||
Steps 5 and 8 are stretch goals and may be done in later OpenShift releases. | ||
The other steps are hard requirements, and we intend to complete them in | ||
OpenShift 4.18. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the GA step after this paragraph?
enhance DCM by using newer capabilities in HAProxy 2.y to add arbitrarily many | ||
servers dynamically, add backends dynamically, and update certificates | ||
dynamically. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there anything DCM-related that we could add via the haproxy config template that might require an openshift/API change? Getting an API framework in place might be useful.
Are all the Unix socket commands listed in https://docs.haproxy.org/2.8/management.html#9.3 considered to be dynamic config options? So, not only can we add/del servers, but we could also add/del acls, maps, etc, and disable frontends, servers , healthchecks, and set maxconns/rate-limit, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there anything DCM-related that we could add via the haproxy config template that might require an openshift/API change? Getting an API framework in place might be useful.
Ideally not. We want DCM to be as transparent and dynamic as possible, requiring no ahead-of-time configuration.
Are all the Unix socket commands listed in https://docs.haproxy.org/2.8/management.html#9.3 considered to be dynamic config options? So, not only can we add/del servers, but we could also add/del acls, maps, etc, and disable frontends, servers , healthchecks, and set maxconns/rate-limit, etc.
Right, any configuration that the router performs using the Unix domain socket is dynamic configuration. We might indeed be able to use additional capabilities to avoid reloads in more cases.
Of course, we do need to do some cost-benefit analysis. It may be worth the engineering effort to implement dynamic configuration of, say certificates but not worth the effort to implement dynamic configuration of rate limiting, depending on how much effort is required and how often people actually update that particular kind of configuration.
https://gist.github.com/frobware/2b527ce3f040797909eff482a4776e0b for an | ||
analysis of the potential memory impact for different choices of balancing | ||
algorithm, number of threads, number of backends, and number of pre-allocated | ||
server slots per backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to see a brief summary that shows how fast memory grows for a typical use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean pulling in Andy's table, or do you mean gathering more data so we can plot some curves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither - I meant summarizing the data in the table. You could keep the reference to the data set.
### Graduation Criteria | ||
|
||
We will introduce the feature as TechPreviewNoUpgrade in OpenShift 4.18, with | ||
the goal of graduating it to GA in the same release. Further improvements to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "goal of graduating it to GA in the same release" still applicable given our 2025 priorities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on how things go with the Tech Preview, and whether we discover any new major issues.
|
||
- OpenShift Enterprise 3.11 added DCM and the `ROUTER_HAPROXY_CONFIG_MANAGER` option ([release notes](https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html#ocp-311-haproxy-enhancements), [documentation](https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html#using-the-dynamic-configuration-manager)). | ||
- OpenShift Container Platform 4.9 added the `dynamicConfigManager` config override, default off ([openshift/cluster-ingress-operator@6a8516a](https://github.com/openshift/cluster-ingress-operator/pull/628/commits/6a8516ab247b00b87a5d7b32e20d4cffcefe1c0f)). | ||
- OpenShift Container Platform 4.18 enables DCM by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, if we use a feature gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nit picks.
superseded-by: | ||
--- | ||
|
||
# OpenShift Router Dynamic Config Manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the full name of the feature is "Dynamic Configuration Manager".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. We use "dynamic config manager" in the source code and colloquially, but https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html#using-the-dynamic-configuration-manager has "Dynamic Configuration Manager".
gigabytes per process, ultimately causing out-of-memory errors on the node host. | ||
|
||
DCM addresses this issue by reducing the need to reload the configuration. | ||
Instead, DCM configures HAProxy using its Unix domain control socket, which does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "control socket" term doesn't seem to relate to any HAProxy or router technology.
Instead, DCM configures HAProxy using its Unix domain control socket, which does | |
Instead, DCM configures HAProxy using its Runtime API, which does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, "control socket" is a term we use colloquially within the team, and it doesn't appear in the HAProxy documentation, which uses the term "stats socket". However, "stats socket" is a bit of a misnomer, I don't see "Runtime API" in https://docs.haproxy.org/2.8/management.html, and "Unix domain control socket" is descriptive of what the socket is, the capabilities of the socket, and our usage of it. Would it be all right by you if I continued using the term "control socket" but defined it up front in the EP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Runtime API can be accessed via a UNIX socket or via IP:port. Ref: HAProxy Runtime API installation guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that "Runtime API" really means commands on the stats socket. However, "Runtime API" seems to be part of the nomenclature from the commercial HAProxy product documentation whereas the project does not use this nomenclature. Personally, I find the phrase "Runtime API" confusing and unnecessary.
Would you be all right with using the phrase "Unix Socket commands" from the HAProxy documentation, with a reference to the same?
Instead, DCM configures HAProxy using its Unix domain control socket, which does | |
Instead, DCM configures HAProxy using [Unix Socket commands](https://docs.haproxy.org/2.8/management.html#9.3), which does |
1. Manually verify that the router functions and passes E2E tests with DCM enabled. | ||
2. Add a tech-preview featuregate in openshift/api for DCM. | ||
3. Update cluster-ingress-operator to enable DCM if the featuregate is enabled. | ||
4. Re-enable old E2E tests in openshift/origin for DCM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done if DCM will be in techpreview featureset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The openshift/origin tests create router pods directly, so the tests can enable DCM irrespective of the featuregate: https://github.com/openshift/origin/blob/203d2b124037a543d8719ba7e2a369f52f01686d/test/extended/router/config_manager.go#L328-L489
|
||
Dynamic Config Manager has some known defects: | ||
|
||
- [OCPBUGS-7466 No prometheus haproxy metrics present for route created by dynamicConfigManager](https://issues.redhat.com/browse/OCPBUGS-7466) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a risk considering the fact that we won't enable blueprint routes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, OCPBUGS-7466 pertains to blueprint routes. We can note this as irrelevant for our supported configuration.
Dynamic Config Manager has some known defects: | ||
|
||
- [OCPBUGS-7466 No prometheus haproxy metrics present for route created by dynamicConfigManager](https://issues.redhat.com/browse/OCPBUGS-7466) | ||
- [OCPBUGSM-20868 Sticky cookies take long to start working if config manager is enabled](https://issues.redhat.com/browse/OCPBUGSM-20868) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, essentially this is what I discovered too (fix PR).
|
||
#### Can we use HAProxy's control socket to add backends and certificates dynamically? | ||
|
||
Answer: **?**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backends - no. Certificates - seems like yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it does seem that dynamic backends are are long discussed but never implemented possibility. Sadface.
or for scale-out following scale-in. However, we found that DCM still requires | ||
a reload for scale-in of a server that doesn't use a pre-allocated server slot. | ||
(TODO: This needs to be double-checked.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did see stopped not pre-allocated servers. So, scale-ins should still be dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to scale a deployment from 4 replicas (3 static slots UP + 1 dynamic slot UP
) to 1. Here is the output of show servers state
:
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl srv_check_port srv_check_addr srv_agent_addr srv_agent_port
17 be_http:test:httpd-24 1 pod:httpd-24-8554dcd85d-rrsrl:httpd-24:8080-tcp:10.128.2.15:8080 10.128.2.15 0 1 1 1 156 6 3 0 14 0 0 0 - 8080 - 0 0 - - 0
17 be_http:test:httpd-24 2 pod:httpd-24-8554dcd85d-nvnl5:httpd-24:8080-tcp:10.129.2.22:8080 10.129.2.22 0 1 1 1 9 6 3 0 14 0 0 0 - 8080 - 0 0 - - 0
17 be_http:test:httpd-24 3 pod:httpd-24-8554dcd85d-m2hbj:httpd-24:8080-tcp:10.131.0.16:8080 10.131.0.16 2 0 1 1 490 6 3 4 6 0 0 0 - 8080 - 0 0 - - 0
17 be_http:test:httpd-24 4 _dynamic-pod-1 10.129.2.23 0 5 1 1 234 6 3 0 14 0 0 0 - 8080 - 0 0 - - 0
1 pod:httpd-24-8554dcd85d-rrsrl:httpd-24:8080-tcp:10.128.2.15:8080 10.128.2.15 STOP MAINT 1 CHECK CONFENABLEPAUSED
2 pod:httpd-24-8554dcd85d-nvnl5:httpd-24:8080-tcp:10.129.2.22:8080 10.129.2.22 STOP MAINT 1 CHECK CONFENABLEPAUSED
3 pod:httpd-24-8554dcd85d-m2hbj:httpd-24:8080-tcp:10.131.0.16:8080 10.131.0.16 UP UP 1 CHECK CONFENABLED
4 _dynamic-pod-1 10.129.2.23 STOP MAINT 1 CHECK CONFENABLEPAUSED
*At the top is the raw output, at the bottom is a simplified human readable view of raw data.
Same behavior if 4 replicas are enabled via 4 statis slots (dynamic slot is not used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw even the route deletion doesn't lead to a reload if DCM is enabled. All of the servers of the deleted backend will be disabled but the backend will stay in show backend
output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises the question of whether we need to log additional operational debug information somewhere. With DCM enabled, the haproxy.config becomes stale, so relying solely on its contents during debugging could lead to misleading conclusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I had a user story for this but didn't have enough time to implement something for this iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to scale a deployment from 4 replicas (
3 pre-allocated slots UP + 1 dynamic slot UP
) to 1. Here is the output ofshow servers state
:
I'm confused. I've been saying "pre-allocated" to describe the slots that we pre-allocate in haproxy.config
for DCM to use to add new server endpoints to a backend. What do you mean when you say "3 pre-allocated slots"? Are these 3 pre-allocated slots for DCM, or are these 3 server
lines for endpoints that are defined in haproxy.config
?
Are you saying that I can create a route with 3 endpoints (which get written to haproxy.config
), and then scale in to 1 endpoint, and no reload is required?
If I have a route with 1 pod endpoint (written to haproxy.config
), delete the pod, then recreate the pod, is a reload required?
Btw even the route deletion doesn't lead to a reload if DCM is enabled. All of the servers of the deleted backend will be disabled but the backend will stay in
show backend
output.
This is excellent news!
Can you reproduce these results with non-TLS, TLS passthrough, edge-terminated, and reencrypt routes? Since there are some nuances (server
versus server-template
), I'm wary of subtle differences in behavior for different route types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. I've been saying "pre-allocated" to describe the slots that we pre-allocate in haproxy.config for DCM to use to add new server endpoints to a backend. What do you mean when you say "3 pre-allocated slots"? Are these 3 pre-allocated slots for DCM, or are these 3 server lines for endpoints that are defined in haproxy.config?
Sorry, my bad. I should have called them "static" slots. The pre-allocated are dynamic slots.
Are you saying that I can create a route with 3 endpoints (which get written to haproxy.config), and then scale in to 1 endpoint, and no reload is required?
Right, if DCM is enabled.
If I have a route with 1 pod endpoint (written to haproxy.config), delete the pod, then recreate the pod, is a reload required?
No. The deletion of the pod will set the corresponding server to maintenance
mode. The recreation of a pod will use the dynamic (pre-allocated) server slot to set the new IP.
No description provided.