Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update otelcontribcol to latest #1391

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ COSIGN := $(BIN_DIR)/cosign
GIT_SYNC_VERSION := v4.3.0-gke.4__linux_amd64
GIT_SYNC_IMAGE_NAME := gcr.io/config-management-release/git-sync:$(GIT_SYNC_VERSION)

OTELCONTRIBCOL_VERSION := v0.103.0-gke.6
OTELCONTRIBCOL_VERSION := v0.115.0-gke.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Document the reason for updating to v0.115.0-gke.0 (e.g., bug fixes, security updates, feature additions). This helps track changes and understand the context of the update.

OTELCONTRIBCOL_IMAGE_NAME := gcr.io/config-management-release/otelcontribcol:$(OTELCONTRIBCOL_VERSION)

# Directory used for staging Docker contexts.
Expand Down
4 changes: 3 additions & 1 deletion e2e/testdata/otel-collector/otel-cm-full-gcm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ data:
otel-collector-config.yaml: |-
receivers:
opencensus:
endpoint: 0.0.0.0:55678
janetkuo marked this conversation as resolved.
Show resolved Hide resolved
exporters:
prometheus:
endpoint: :8675
endpoint: 0.0.0.0:8675
namespace: config_sync
resource_to_telemetry_conversion:
enabled: true
Expand Down Expand Up @@ -303,6 +304,7 @@ data:
aggregation_type: max
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ data:
otel-collector-config.yaml: |-
receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
prometheus:
endpoint: :8675
endpoint: 0.0.0.0:8675
namespace: config_sync
resource_to_telemetry_conversion:
enabled: true
Expand Down Expand Up @@ -258,6 +259,7 @@ data:
aggregation_type: max
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ data:
otel-collector-config.yaml: |-
receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
prometheus:
endpoint: :8675
endpoint: 0.0.0.0:8675
namespace: config_sync
resource_to_telemetry_conversion:
enabled: true
Expand Down Expand Up @@ -167,6 +168,7 @@ data:
new_name: no_ssl_verify_count
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
Expand Down
1 change: 1 addition & 0 deletions manifests/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ resources:
- ../ns-reconciler-base-cluster-role.yaml
- ../root-reconciler-base-cluster-role.yaml
- ../otel-agent-cm.yaml
- ../otel-agent-reconciler-cm.yaml
- ../reconciler-manager-service-account.yaml
- ../reposync-crd.yaml
- ../rootsync-crd.yaml
Expand Down
22 changes: 3 additions & 19 deletions manifests/otel-agent-cm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,43 +26,27 @@ data:
otel-agent-config.yaml: |
receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
opencensus:
endpoint: otel-collector.config-management-monitoring:55678
tls:
insecure: true
processors:
# Attributes processor adds custom configsync metric labels to applicable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these attributes get removed from the agent config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be replaced by manifests/otel-agent-reconciler-cm.yaml

# metrics to identify the sync object used to configure this deployment.
#
# Note: configsync.sync.generation is explicitly excluded here, because it
# is high cardinality. So we don't want to send it as a label, only as a
# resource attribute. That way it's only propagated to Prometheus, and not
# Monarch or Cloud Monitoring, which ignore custom resource attributes.
attributes:
actions:
- key: configsync.sync.kind
action: upsert
value: $CONFIGSYNC_SYNC_KIND
- key: configsync.sync.name
action: upsert
value: $CONFIGSYNC_SYNC_NAME
- key: configsync.sync.namespace
action: upsert
value: $CONFIGSYNC_SYNC_NAMESPACE
batch:
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and
# the GCE metadata service, if available.
resourcedetection:
Comment on lines 36 to 39

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Since the attributes processor is removed, delete these comments to maintain clarity and prevent confusion.

Suggested change
batch:
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and
# the GCE metadata service, if available.
resourcedetection:
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and
# the GCE metadata service, if available.
resourcedetection:
detectors: [env, gcp]

detectors: [env, gcp]
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
metrics:
receivers: [opencensus]
processors: [batch, resourcedetection, attributes]
processors: [batch, resourcedetection]
exporters: [opencensus]
telemetry:
logs:
Expand Down
71 changes: 71 additions & 0 deletions manifests/otel-agent-reconciler-cm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: ConfigMap
metadata:
name: otel-agent-reconciler
namespace: config-management-system
labels:
app: opentelemetry
component: otel-agent
configmanagement.gke.io/system: "true"
configmanagement.gke.io/arch: "csmr"
data:
otel-agent-reconciler-config.yaml: |
receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
opencensus:
endpoint: otel-collector.config-management-monitoring:55678
tls:
insecure: true
processors:
# Attributes processor adds custom configsync metric labels to applicable
# metrics to identify the sync object used to configure this deployment.
#
# Note: configsync.sync.generation is explicitly excluded here, because it
# is high cardinality. So we don't want to send it as a label, only as a
# resource attribute. That way it's only propagated to Prometheus, and not
# Monarch or Cloud Monitoring, which ignore custom resource attributes.
attributes:
actions:
- key: configsync.sync.kind
action: upsert
value: ${CONFIGSYNC_SYNC_KIND}
- key: configsync.sync.name
action: upsert
value: ${CONFIGSYNC_SYNC_NAME}
- key: configsync.sync.namespace
action: upsert
value: ${CONFIGSYNC_SYNC_NAMESPACE}
batch:
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and
# the GCE metadata service, if available.
resourcedetection:
detectors: [env, gcp]
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
metrics:
receivers: [opencensus]
processors: [batch, resourcedetection, attributes]
exporters: [opencensus]
telemetry:
logs:
level: "INFO"
24 changes: 23 additions & 1 deletion manifests/templates/otel-collector.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,18 @@ data:
otel-collector-config.yaml: |
receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
prometheus:
endpoint: :8675
endpoint: 0.0.0.0:8675
namespace: config_sync
resource_to_telemetry_conversion:
enabled: true
processors:
batch:
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
Expand Down Expand Up @@ -66,6 +68,25 @@ spec:
port: 8888
- name: metrics # Prometheus exporter metrics.
port: 8675
- name: health-check
port: 13133
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-port-ingress
namespace: config-management-monitoring
spec:
podSelector:
matchLabels:
app: opentelemetry
component: otel-collector
ingress:
- from:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 55678
Comment on lines +74 to +89

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Adding a NetworkPolicy is a good security practice. Document its purpose (allowing ingress traffic on port 55678) for better maintainability.

---
apiVersion: apps/v1
kind: Deployment
Expand Down Expand Up @@ -112,6 +133,7 @@ spec:
- containerPort: 55678 # Default endpoint for OpenCensus receiver.
- containerPort: 8888 # Default endpoint for querying metrics.
- containerPort: 8675 # Prometheus exporter metrics.
- containerPort: 13133 # Health check
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
Expand Down
10 changes: 6 additions & 4 deletions manifests/templates/reconciler-manager-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ data:
command:
- /otelcontribcol
args:
- "--config=/conf/otel-agent-config.yaml"
- "--config=/conf/otel-agent-reconciler-config.yaml"
# The prometheus transformer appends `_ratio` to gauge metrics: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.86.0/pkg/translator/prometheus/normalize_name.go#L149
# Add the feature gate to enable metric suffix trimming.
- "--feature-gates=-pkg.translator.prometheus.NormalizeName"
Expand All @@ -190,8 +190,10 @@ data:
protocol: TCP
- containerPort: 8888 # Metrics.
protocol: TCP
- containerPort: 13133 # Health check
protocol: TCP
volumeMounts:
- name: otel-agent-config-vol
- name: otel-agent-config-reconciler-vol
mountPath: /conf
readinessProbe:
httpGet:
Expand Down Expand Up @@ -282,9 +284,9 @@ data:
secret:
secretName: git-creds
defaultMode: 288
- name: otel-agent-config-vol
- name: otel-agent-config-reconciler-vol
configMap:
name: otel-agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ConfigMap still used anywhere?

name: otel-agent-reconciler
defaultMode: 420
- name: service-account
emptyDir: {}
Expand Down
1 change: 1 addition & 0 deletions manifests/templates/reconciler-manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ spec:
ports:
- containerPort: 55678 # Default OpenCensus receiver port.
- containerPort: 8888 # Metrics.
- containerPort: 13133 # Health check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the otel-agent readiness probe port?

securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
Expand Down
4 changes: 4 additions & 0 deletions manifests/templates/resourcegroup-manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ data:
otel-agent-config.yaml: |
receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
opencensus:
endpoint: otel-collector.config-management-monitoring:55678
Expand All @@ -168,6 +169,7 @@ data:
detectors: [env, gcp]
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
Expand Down Expand Up @@ -272,6 +274,7 @@ spec:
ports:
- containerPort: 55678
- containerPort: 8888
- containerPort: 13133
readinessProbe:
httpGet:
path: /
Expand Down Expand Up @@ -301,3 +304,4 @@ spec:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault

24 changes: 18 additions & 6 deletions pkg/metrics/otel.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,10 @@ const (
// the googlecloud exporter.
CollectorConfigGooglecloud = `receivers:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
prometheus:
endpoint: :8675
endpoint: 0.0.0.0:8675
namespace: config_sync
resource_to_telemetry_conversion:
enabled: true
Expand Down Expand Up @@ -184,7 +185,9 @@ processors:
new_name: current_declared_resources
operations:
- action: aggregate_labels
label_set: []
# Using a no_op_label to get around issue in the upstream
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430
Comment on lines +188 to +189

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Include a direct link to the upstream issue: open-telemetry/opentelemetry-collector-contrib#34430 for easier reference.

label_set: [no_op_label]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does setting label_set to nil or removing it work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it. Maybe clarify the comment?

aggregation_type: max
- include: kcc_resource_count
action: update
Expand Down Expand Up @@ -255,14 +258,18 @@ processors:
new_name: resource_conflicts_count
operations:
- action: aggregate_labels
label_set: []
# Using a no_op_label to get around issue in the upstream
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430
label_set: [no_op_label]
aggregation_type: max
- include: internal_errors_total
action: update
new_name: internal_errors_count
operations:
- action: aggregate_labels
label_set: []
# Using a no_op_label to get around issue in the upstream
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430
label_set: [no_op_label]
aggregation_type: max
- include: remediate_duration_seconds
action: update
Expand Down Expand Up @@ -322,16 +329,21 @@ processors:
action: update
operations:
- action: aggregate_labels
label_set: []
# Using a no_op_label to get around issue in the upstream
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430
label_set: [no_op_label]
aggregation_type: max
- include: kustomize_build_latency
action: update
operations:
- action: aggregate_labels
label_set: []
# Using a no_op_label to get around issue in the upstream
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430
label_set: [no_op_label]
aggregation_type: max
extensions:
health_check:
endpoint: "0.0.0.0:13133"
service:
extensions: [health_check]
pipelines:
Expand Down
2 changes: 1 addition & 1 deletion pkg/reconcilermanager/controllers/otel_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ const (
// otel-collector ConfigMap.
// See `CollectorConfigGooglecloud` in `pkg/metrics/otel.go`
// Used by TestOtelReconcilerGooglecloud.
depAnnotationGooglecloud = "c2f6078a9afe1f32721173e9e15bbab5"
depAnnotationGooglecloud = "5e22170ef10a382f587d3d7595fdaebc"
// depAnnotationGooglecloud is the expected hash of the custom
// otel-collector ConfigMap test artifact.
// Used by TestOtelReconcilerCustom.
Expand Down
Loading