Skip to content

Commit

Permalink
Merge pull request #172 from stackhpc/2024.1-cherrypick
Browse files Browse the repository at this point in the history
Apply 2024.1 backports from 2023.1
  • Loading branch information
mnasiadka authored Jul 5, 2024
2 parents 3553e9f + 15aee8b commit 1dcd259
Show file tree
Hide file tree
Showing 17 changed files with 1,921 additions and 147 deletions.
38 changes: 30 additions & 8 deletions doc/source/user/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,12 @@ the table are linked to more details elsewhere in the user guide.
+---------------------------------------+--------------------+---------------+
| `octavia_lb_healthcheck`_ | see bellow | true |
+---------------------------------------+--------------------+---------------+
| `extra_network`_ | see below | "" |
+---------------------------------------+--------------------+---------------+
| `extra_subnet`_ | see below | "" |
+---------------------------------------+--------------------+---------------+
| `extra_security_group`_ | see below | see below |
+---------------------------------------+--------------------+---------------+

.. _cluster:

Expand Down Expand Up @@ -1175,13 +1181,14 @@ _`container_infra_prefix`

Images that might be needed if 'monitoring_enabled' is 'true':

* quay.io/prometheus/alertmanager:v0.20.0
* docker.io/squareup/ghostunnel:v1.5.2
* docker.io/jettech/kube-webhook-certgen:v1.0.0
* quay.io/coreos/prometheus-operator:v0.37.0
* quay.io/coreos/configmap-reload:v0.0.1
* quay.io/coreos/prometheus-config-reloader:v0.37.0
* quay.io/prometheus/prometheus:v2.15.2
* quay.io/prometheus/alertmanager:v0.21.0
* docker.io/jettech/kube-webhook-certgen:v1.5.0
* quay.io/prometheus-operator/prometheus-operator:v0.44.0
* docker.io/jimmidyson/configmap-reload:v0.4.0
* quay.io/prometheus-operator/prometheus-config-reloader:v0.44.0
* quay.io/prometheus/prometheus:v2.22.1
* quay.io/prometheus/node-exporter:v1.0.1
* docker.io/directxman12/k8s-prometheus-adapter:v0.8.2

Images that might be needed if 'cinder_csi_enabled' is 'true':

Expand Down Expand Up @@ -1548,6 +1555,22 @@ _`octavia_lb_healthcheck`
If true, enable Octavia load balancer healthcheck
Default: true

_`extra_network`
Optional additional network name or UUID to add to cluster nodes.
When not specified, additional networks are not added. Optionally specify
'extra_subnet' if you wish to use a specific subnet on the network.
Default: ""

_`extra_subnet`
Optional additional subnet name or UUID to add to cluster nodes.
Only used when 'extra_network' is defined.
Default: ""

_`extra_security_group`
Optional additional group name or UUID to add to network port.
Only used when 'extra_network' is defined.
Default: cluster node default security group.

Supported versions
------------------

Expand Down Expand Up @@ -2297,7 +2320,6 @@ _`calico_tag`
Victoria default: v3.13.1
Wallaby default: v3.13.1


Besides, the Calico network driver needs kube_tag with v1.9.3 or later, because
Calico needs extra mounts for the kubelet container. See `commit
<https://github.com/projectatomic/atomic-system-containers/commit/54ab8abc7fa1bfb6fa674f55cd0c2fa0c812fd36>`_
Expand Down
21 changes: 13 additions & 8 deletions doc/source/user/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,15 @@ _`metrics_server_enabled`

_`monitoring_enabled`
Enable installation of cluster monitoring solution provided by the
stable/prometheus-operator helm chart.
prometheus-community/kube-prometheus-stack helm chart.
To use this service tiller_enabled must be true when using
helm_client_tag<v3.0.0.
Default: false

_`prometheus_adapter_enabled`
Enable installation of cluster custom metrics provided by the
stable/prometheus-adapter helm chart. This service depends on
monitoring_enabled.
prometheus-community/prometheus-adapter helm chart.
This service depends on monitoring_enabled.
Default: true

To control deployed versions, extra labels are available:
Expand All @@ -52,14 +54,17 @@ _`metrics_server_chart_tag`

_`prometheus_operator_chart_tag`
Add prometheus_operator_chart_tag to select version of the
stable/prometheus-operator chart to install. When installing the chart,
helm will use the default values of the tag defined and overwrite them based
on the prometheus-operator-config ConfigMap currently defined. You must
certify that the versions are compatible.
prometheus-community/kube-prometheus-stack chart to install.
When installing the chart, helm will use the default values of the tag
defined and overwrite them based on the prometheus-operator-config
ConfigMap currently defined.
You must certify that the versions are compatible.
Wallaby-default: 17.2.0

_`prometheus_adapter_chart_tag`
The stable/prometheus-adapter helm chart version to use.
The prometheus-community/prometheus-adapter helm chart version to use.
Train-default: 1.4.0
Wallaby-default: 2.12.1

Full fledged cluster monitoring
+++++++++++++++++++++++++++++++
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ CERT_DIR=/etc/kubernetes/certs

# kube-proxy config
PROXY_KUBECONFIG=/etc/kubernetes/proxy-kubeconfig.yaml
KUBE_PROXY_ARGS="--kubeconfig=${PROXY_KUBECONFIG} --cluster-cidr=${PODS_NETWORK_CIDR} --hostname-override=${INSTANCE_NAME}"
KUBE_PROXY_ARGS="--kubeconfig=${PROXY_KUBECONFIG} --cluster-cidr=${PODS_NETWORK_CIDR} --hostname-override=${INSTANCE_NAME} --metrics-bind-address=0.0.0.0"
cat > /etc/kubernetes/proxy << EOF
KUBE_PROXY_ARGS="${KUBE_PROXY_ARGS} ${KUBEPROXY_OPTIONS}"
EOF
Expand Down Expand Up @@ -406,6 +406,8 @@ KUBE_CONTROLLER_MANAGER_ARGS="--leader-elect=true --kubeconfig=/etc/kubernetes/a
KUBE_CONTROLLER_MANAGER_ARGS="$KUBE_CONTROLLER_MANAGER_ARGS --cluster-name=${CLUSTER_UUID}"
KUBE_CONTROLLER_MANAGER_ARGS="${KUBE_CONTROLLER_MANAGER_ARGS} --allocate-node-cidrs=true"
KUBE_CONTROLLER_MANAGER_ARGS="${KUBE_CONTROLLER_MANAGER_ARGS} --cluster-cidr=${PODS_NETWORK_CIDR}"
KUBE_CONTROLLER_MANAGER_ARGS="${KUBE_CONTROLLER_MANAGER_ARGS} --secure-port=10257"
KUBE_CONTROLLER_MANAGER_ARGS="${KUBE_CONTROLLER_MANAGER_ARGS} --authorization-always-allow-paths=/healthz,/readyz,/livez,/metrics"
KUBE_CONTROLLER_MANAGER_ARGS="$KUBE_CONTROLLER_MANAGER_ARGS $KUBECONTROLLER_OPTIONS"
if [ -n "${ADMISSION_CONTROL_LIST}" ] && [ "${TLS_DISABLED}" == "False" ]; then
KUBE_CONTROLLER_MANAGER_ARGS="$KUBE_CONTROLLER_MANAGER_ARGS --service-account-private-key-file=$CERT_DIR/service_account_private.key --root-ca-file=$CERT_DIR/ca.crt"
Expand All @@ -428,7 +430,7 @@ sed -i '
/^KUBE_CONTROLLER_MANAGER_ARGS=/ s#\(KUBE_CONTROLLER_MANAGER_ARGS\).*#\1="'"${KUBE_CONTROLLER_MANAGER_ARGS}"'"#
' /etc/kubernetes/controller-manager

sed -i '/^KUBE_SCHEDULER_ARGS=/ s#=.*#="--leader-elect=true --kubeconfig=/etc/kubernetes/admin.conf"#' /etc/kubernetes/scheduler
sed -i '/^KUBE_SCHEDULER_ARGS=/ s#=.*#="--leader-elect=true --kubeconfig=/etc/kubernetes/admin.conf --authorization-always-allow-paths=/healthz,/readyz,/livez,/metrics "#' /etc/kubernetes/scheduler

$ssh_cmd mkdir -p /etc/kubernetes/manifests
KUBELET_ARGS="--register-node=true --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=${INSTANCE_NAME}"
Expand Down Expand Up @@ -497,7 +499,14 @@ KUBELET_ARGS="${KUBELET_ARGS} --client-ca-file=${CERT_DIR}/ca.crt --tls-cert-fil

# specified cgroup driver
KUBELET_ARGS="${KUBELET_ARGS} --cgroup-driver=${CGROUP_DRIVER}"

if [ ${CONTAINER_RUNTIME} = "containerd" ] ; then
# check kubelet version, 1.27.0 dropped docker shim and --container-runtime command line option
KUBELET_VERSION=$($ssh_cmd podman run --rm ${CONTAINER_INFRA_PREFIX:-${HYPERKUBE_PREFIX}}hyperkube:${KUBE_TAG} kubelet --version | awk '{print $2}')
CONTAINER_RUNTIME_REMOTE_DROPPED="v1.27.0"
if [[ "${CONTAINER_RUNTIME_REMOTE_DROPPED}" != $(echo -e "${CONTAINER_RUNTIME_REMOTE_DROPPED}\n${KUBELET_VERSION}" | sort -V | head -n1) && "${KUBELET_VERSION}" != "devel" ]]; then
KUBELET_ARGS="${KUBELET_ARGS} --container-runtime=remote"
fi
KUBELET_ARGS="${KUBELET_ARGS} --runtime-cgroups=/system.slice/containerd.service"
KUBELET_ARGS="${KUBELET_ARGS} --runtime-request-timeout=15m"
KUBELET_ARGS="${KUBELET_ARGS} --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,12 @@ KUBELET_ARGS="${KUBELET_ARGS} --client-ca-file=${CERT_DIR}/ca.crt --tls-cert-fil
# specified cgroup driver
KUBELET_ARGS="${KUBELET_ARGS} --cgroup-driver=${CGROUP_DRIVER}"
if [ ${CONTAINER_RUNTIME} = "containerd" ] ; then
# check kubelet version, 1.27.0 dropped docker shim and --container-runtime command line option
KUBELET_VERSION=$($ssh_cmd podman run --rm ${CONTAINER_INFRA_PREFIX:-${HYPERKUBE_PREFIX}}hyperkube:${KUBE_TAG} kubelet --version | awk '{print $2}')
CONTAINER_RUNTIME_REMOTE_DROPPED="v1.27.0"
if [[ "${CONTAINER_RUNTIME_REMOTE_DROPPED}" != $(echo -e "${CONTAINER_RUNTIME_REMOTE_DROPPED}\n${KUBELET_VERSION}" | sort -V | head -n1) && "${KUBELET_VERSION}" != "devel" ]]; then
KUBELET_ARGS="${KUBELET_ARGS} --container-runtime=remote"
fi
KUBELET_ARGS="${KUBELET_ARGS} --runtime-cgroups=/system.slice/containerd.service"
KUBELET_ARGS="${KUBELET_ARGS} --runtime-request-timeout=15m"
KUBELET_ARGS="${KUBELET_ARGS} --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,17 @@ data:
Corefile: |
.:53 {
errors
log
health
log stdout
health {
lameduck 5s
}
ready
kubernetes ${DNS_CLUSTER_DOMAIN} ${PORTAL_NETWORK_CIDR} ${PODS_NETWORK_CIDR} {
pods verified
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
forward . /run/systemd/resolve/resolv.conf
cache 30
loop
reload
Expand Down Expand Up @@ -141,6 +144,9 @@ spec:
readOnly: true
- name: tmp
mountPath: /tmp
- name: resolvconf
mountPath: /run/systemd/resolve/resolv.conf
readOnly: true
ports:
- containerPort: 53
name: dns
Expand Down Expand Up @@ -183,6 +189,10 @@ spec:
items:
- key: Corefile
path: Corefile
- name: resolvconf
hostPath:
path: /run/systemd/resolve/resolv.conf
type: File
---
apiVersion: v1
kind: Service
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,54 +30,50 @@ rules:
resources: ["leases"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get", "update", "patch", "delete"]
# TODO: remove in 1.18; CA uses lease objects for leader election since 1.17
- apiGroups: [""]
resources: ["endpoints"]
resources: ["events", "endpoints"]
verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["endpoints"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get", "update", "patch", "delete"]
# accessing & modifying cluster state (nodes & pods)
verbs: ["get", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
verbs: ["watch", "list", "get", "update"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
# read-only access to cluster state
- apiGroups: [""]
resources: ["services", "replicationcontrollers", "persistentvolumes", "persistentvolumeclaims"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["daemonsets", "replicasets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["get", "list", "watch"]
resources:
- "namespaces"
- "pods"
- "services"
- "replicationcontrollers"
- "persistentvolumeclaims"
- "persistentvolumes"
verbs: ["watch", "list", "get"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch"]
verbs: ["watch", "list", "get"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["get", "list", "watch"]
verbs: ["watch", "list"]
- apiGroups: ["apps"]
resources: ["daemonsets", "replicasets", "statefulsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes"]
verbs: ["get", "list", "watch"]
# misc access
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
verbs: ["watch", "list", "get"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
verbs: ["create","list","watch"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["cluster-autoscaler-status"]
verbs: ["get", "update", "patch", "delete"]
resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
verbs: ["delete", "get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ EOF
curl $VERIFY_CA -X GET \
-H "X-Auth-Token: $USER_TOKEN" \
-H "OpenStack-API-Version: container-infra latest" \
$MAGNUM_URL/certificates/$CLUSTER_UUID | python -c 'import sys, json; print(json.load(sys.stdin)["pem"])' >> $CA_CERT
$MAGNUM_URL/certificates/$CLUSTER_UUID | python -c 'import sys, json; print(json.load(sys.stdin)["pem"])' > $CA_CERT

# Generate client's private key and csr
$ssh_cmd openssl genrsa -out "${_KEY}" 4096
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@ EOF
cat << EOF >> ${HELM_CHART_DIR}/values.yaml
prometheus-adapter:
image:
repository: ${CONTAINER_INFRA_PREFIX:-docker.io/directxman12/}k8s-prometheus-adapter-${ARCH}
repository: ${CONTAINER_INFRA_PREFIX:-k8s.gcr.io/prometheus-adapter/}prometheus-adapter
priorityClassName: "system-cluster-critical"
prometheus:
url: http://web.tcp.prometheus-prometheus.kube-system.svc.cluster.local
url: http://web.tcp.magnum-kube-prometheus-sta-prometheus.kube-system.svc.cluster.local
path: /prometheus
resources:
requests:
cpu: 150m
Expand Down
Loading

0 comments on commit 1dcd259

Please sign in to comment.