Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/cgroupruntime] Be aware of ECS task and CPU limits #36920

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
27 changes: 27 additions & 0 deletions .chloggen/check-ecs-metadata-cgroupruntimeextension.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: cgroupruntimeextension

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Implement ECS metadata retrieval for cgroupruntime extension.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [36814]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
26 changes: 26 additions & 0 deletions extension/cgroupruntimeextension/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Contributing to the Cgroup Go runtime extension

In order to contribute to this extension, it might be useful to have a working local setup.

## Testing

To run the integration tests locally for this extension, you can follow theses steps in a Linux environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that some Linux distributions already run systemd under cgroupv2, in that case, integration tests can be run without the docker workaround.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice added 12acc97


Inside the extension folder, start a privileged docker container and share the code with the container

```bash
cd extension/cgroupruntimeextension
docker run -ti --privileged --cgroupns=host -v $(pwd):/workspace -w /workspace debian:bookworm-slim
```

Install Go and gcc to run the integration test
r0mdau marked this conversation as resolved.
Show resolved Hide resolved

```bash
apt update && apt install -y wget sudo gcc && wget https://go.dev/dl/go1.23.4.linux-amd64.tar.gz && tar -C /usr/local -xzf go1.23.4.linux-amd64.tar.gz && export PATH=$PATH:/usr/local/go/bin && go version && rm go1.23.4.linux-amd64.tar.gz
```

Run the integration test

```bash
CGO_ENABLED=1 go test -v -exec sudo -race -timeout 360s -parallel 4 -tags=integration,""
```
6 changes: 5 additions & 1 deletion extension/cgroupruntimeextension/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

## Overview

The OpenTelemetry Cgroup Auto-Config Extension is designed to optimize Go runtime performance in containerized environments by automatically configuring GOMAXPROCS and GOMEMLIMIT based on the Linux cgroup filesystem. This extension leverages [automaxprocs](https://github.com/uber-go/automaxprocs) and [automemlimit](https://github.com/KimMachineGun/automemlimit) packages to dynamically adjust Go runtime variables, ensuring efficient resource usage aligned with container limits.
The OpenTelemetry Cgroup Auto-Config Extension is designed to optimize Go runtime performance in containerized environments by automatically configuring GOMAXPROCS and GOMEMLIMIT based on the Linux cgroup filesystem. This extension leverages [automaxprocs](https://github.com/uber-go/automaxprocs) or [gomaxecs](https://github.com/rdforte/gomaxecs) for AWS ECS Tasks and [automemlimit](https://github.com/KimMachineGun/automemlimit) packages to dynamically adjust Go runtime variables, ensuring efficient resource usage aligned with container limits.

## Configuration

Expand All @@ -40,3 +40,7 @@ extension:
enabled: true
ratio: 0.8
```

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for information on how to contribute to this extension.
9 changes: 7 additions & 2 deletions extension/cgroupruntimeextension/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"runtime/debug"

"github.com/KimMachineGun/automemlimit/memlimit"
gomaxecs "github.com/rdforte/gomaxecs/maxprocs"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension"
"go.uber.org/automaxprocs/maxprocs"
Expand Down Expand Up @@ -42,10 +43,14 @@ func createExtension(_ context.Context, set extension.Settings, cfg component.Co
cgroupConfig := cfg.(*Config)
return newCgroupRuntime(cgroupConfig, set.Logger,
func() (undoFunc, error) {
undo, err := maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
if gomaxecs.IsECS() {
return gomaxecs.Set(gomaxecs.WithLogger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
}
return maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
return undoFunc(undo), err
},
func(ratio float64) (undoFunc, error) {
initial, err := memlimit.SetGoMemLimitWithOpts(memlimit.WithRatio(ratio))
Expand Down
3 changes: 2 additions & 1 deletion extension/cgroupruntimeextension/go.mod
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
module github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension

go 1.22.0
go 1.22.4
r0mdau marked this conversation as resolved.
Show resolved Hide resolved

require (
github.com/KimMachineGun/automemlimit v0.6.1
github.com/containerd/cgroups/v3 v3.0.5
github.com/rdforte/gomaxecs v1.1.0
github.com/stretchr/testify v1.10.0
go.opentelemetry.io/collector/component v0.116.1-0.20241220212031-7c2639723f67
go.opentelemetry.io/collector/component/componenttest v0.116.1-0.20241220212031-7c2639723f67
Expand Down
2 changes: 2 additions & 0 deletions extension/cgroupruntimeextension/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

102 changes: 102 additions & 0 deletions extension/cgroupruntimeextension/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ import (
"context"
"fmt"
"math"
"net/http"
"net/http/httptest"
"os"
"path"
"path/filepath"
Expand All @@ -30,6 +32,7 @@ import (

const (
defaultCgroup2Path = "/sys/fs/cgroup"
ecsMetadataUri = "ECS_CONTAINER_METADATA_URI_V4"
)

// checkCgroupSystem skips the test if is not run in a cgroupv2 system
Expand Down Expand Up @@ -63,6 +66,26 @@ func cgroupMaxCpu(filename string) (quota int64, period uint64, err error) {
return quota, period, err
}

func testServerECSMetadata(t *testing.T, containerCPU, taskCPU int) *httptest.Server {
t.Helper()

mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, _ *http.Request) {
_, err := w.Write([]byte(fmt.Sprintf(`{"Limits":{"CPU":%d},"DockerId":"container-id"}`, containerCPU)))
assert.NoError(t, err)
})
mux.HandleFunc("/task", func(w http.ResponseWriter, _ *http.Request) {
_, err := w.Write([]byte(fmt.Sprintf(
`{"Containers":[{"DockerId":"container-id","Limits":{"CPU":%d}}],"Limits":{"CPU":%d}}`,
containerCPU,
taskCPU,
)))
assert.NoError(t, err)
})

return httptest.NewServer(mux)
}

func TestCgroupV2SudoIntegration(t *testing.T) {
checkCgroupSystem(t)
pointerInt64 := func(val int64) *int64 {
Expand All @@ -81,6 +104,7 @@ func TestCgroupV2SudoIntegration(t *testing.T) {
config *Config
expectedGoMaxProcs int
expectedGoMemLimit int64
setECSMetadataURI bool
}{
{
name: "90% the max cgroup memory and 12 GOMAXPROCS",
Expand Down Expand Up @@ -144,6 +168,71 @@ func TestCgroupV2SudoIntegration(t *testing.T) {
// 134217728 * 0.1
expectedGoMemLimit: 13421772,
},
{
name: "AWS ECS 90% the max cgroup memory and 12 GOMAXPROCS",
cgroupCpuQuota: pointerInt64(100000),
Copy link
Contributor

@rogercoll rogercoll Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt of using the cgroupCpuQuoata variable to set the task/container CPU limits? (instead of expectedGoMaxProcs*1024)? That way is more aligned to other test cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion continuing here: #36920 (comment)

cgroupCpuPeriod: 8000,
// 128 Mb
cgroupMaxMemory: 134217728,
config: &Config{
GoMaxProcs: GoMaxProcsConfig{
Enabled: true,
},
GoMemLimit: GoMemLimitConfig{
Enabled: true,
Ratio: 0.9,
},
},
// 100000 / 8000
expectedGoMaxProcs: 12,
// 134217728 * 0.9
expectedGoMemLimit: 120795955,
setECSMetadataURI: true,
},
{
name: "AWS ECS 50% of the max cgroup memory and 1 GOMAXPROCS",
cgroupCpuQuota: pointerInt64(100000),
cgroupCpuPeriod: 100000,
// 128 Mb
cgroupMaxMemory: 134217728,
config: &Config{
GoMaxProcs: GoMaxProcsConfig{
Enabled: true,
},
GoMemLimit: GoMemLimitConfig{
Enabled: true,
Ratio: 0.5,
},
},
// 100000 / 100000
expectedGoMaxProcs: 1,
// 134217728 * 0.5
expectedGoMemLimit: 67108864,
setECSMetadataURI: true,
},
{
name: "AWS ECS 10% of the max cgroup memory, max cpu, default GOMAXPROCS",
cgroupCpuQuota: nil,
cgroupCpuPeriod: 100000,
// 128 Mb
cgroupMaxMemory: 134217728,
config: &Config{
GoMaxProcs: GoMaxProcsConfig{
Enabled: true,
},
GoMemLimit: GoMemLimitConfig{
Enabled: true,
Ratio: 0.1,
},
},
// GOMAXPROCS is set to the value of `cpu.max / cpu.period`
// If cpu.max is set to max, GOMAXPROCS should not be
// modified
expectedGoMaxProcs: runtime.GOMAXPROCS(-1),
// 134217728 * 0.1
expectedGoMemLimit: 13421772,
setECSMetadataURI: true,
},
}

cgroupPath, err := cgroup2.PidGroupPath(os.Getpid())
Expand Down Expand Up @@ -198,12 +287,25 @@ func TestCgroupV2SudoIntegration(t *testing.T) {

for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
// if running in ECS environment, set the ECS metedata URI environment variable
// to get the Cgroup CPU quota from the httptest server
cleanECS := func() {}
if test.setECSMetadataURI {
server := testServerECSMetadata(t, test.expectedGoMaxProcs*1024, test.expectedGoMaxProcs*1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the comment above, maybe using cgroupCpuQuota might be more appropriate here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cgroup2 manager.update() is still applying the settings. I would even prefer in the ECS tests to put values in cgroupCpuQuota and cgroupCpuPeriod that have nothing to do with the expectedGoMaxProcs value. To prove the code fetched the value from the HTTP endpoint.

Because if IsECS() == true whatever you set in the cgroup, the lib will make a http call to the ECS metadata uri and set GOMAXPROCS.

So I see 3 viable options:

  • Keep as is
  • Only change the cgroupCpuQuota.. not aligned with expectedGoMaxProcs to prove the HTTP call is happening and working
  • Move the ECS tests into a new func next to func TestCgroupV2SudoIntegration(t *testing.T) {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account that cgroupCpuQuota and cgroupCpuPeriod won't be taken into consideration if IsECS() == true, I would go with option 3. It seems to me that having a new TestECSCgroupV2SudoIntegration will provide a cleaner testing interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposition: 12db2c1

I moved some redundant code into functions in order to make it cleaner and that's debatable.

os.Setenv(ecsMetadataUri, server.URL)
cleanECS = func() {
server.Close()
os.Unsetenv(ecsMetadataUri)
}
}

// restore startup cgroup initial resource values
t.Cleanup(func() {
debug.SetMemoryLimit(initialGoMem)
runtime.GOMAXPROCS(initialGoProcs)
memoryCgroupCleanUp()
cpuCgroupCleanUp()
cleanECS()
})

err = manager.Update(&cgroup2.Resources{
Expand Down