Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in example supervisor #256

Open
echlebek opened this issue Feb 28, 2024 · 0 comments
Open

Data race in example supervisor #256

echlebek opened this issue Feb 28, 2024 · 0 comments

Comments

@echlebek
Copy link
Contributor

There is a data race in the example supervisor implementation. It can be observed sporadically while the tests are running.

==================
WARNING: DATA RACE
Write at 0x00c0002b71b0 by goroutine 136:
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/commander.(*Commander).Start()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/commander/commander.go:52 +0x226
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.(*Supervisor).startAgent()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor.go:412 +0x64
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.(*Supervisor).runAgentProcess()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor.go:509 +0x79a
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.NewSupervisor.gowrap1()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor.go:112 +0x33

Previous read at 0x00c0002b71b0 by goroutine 133:
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/commander.(*Commander).Stop()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/commander/commander.go:119 +0x5c
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.(*Supervisor).Shutdown()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor.go:537 +0xad
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.TestNewSupervisor()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor_test.go:63 +0x175
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 136 (running) created at:
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.NewSupervisor()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor.go:112 +0x92a
  github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor.TestNewSupervisor()
      /home/eric/go/src/github.com/open-telemetry/opamp-go/internal/examples/supervisor/supervisor/supervisor_test.go:60 +0x147
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 133 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:47 +0x2bd
==================
--- FAIL: TestNewSupervisor (0.00s)
    testing.go:1398: race detected during execution of test

The basic issue is that the supervisor tries to access an exec.Command object (commander.go) from multiple goroutines. One goroutine is blocking on Wait() while another is reading and writing the object via the commander's methods.

Since Wait() blocks until the command is complete, I don't think it's a simple matter of adding mutex protection for this property. It's likely that the example needs to be reworked.

I think it should be possible to work around this problem by not calling Wait(), and rather checking for the command to be done on an interval instead. That way, concurrent access to the exec.Command object can be serialized with a mutex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant