Use child context in watchAgents to avoid goroutine leak #5888
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracking issue
#3936
Why are the changes needed?
Rebuilding gRPC connections in Agents leaks goroutines
As part of the investigation for this issue we discovered Agents is periodically rebuilding gRPC connections and not closing them because the context used is that of the overall flyte binary. This results in very long-lived, unused gRPC connections and a goroutine leak that steadily increases memory utilization until crash.
What changes were proposed in this pull request?
Use child context in the watcher and cancel the child context every time. Canceling this context will also remove the grpc connection here.
How was this patch tested?
Validated by starting single-binary with agents enabled and querying golang pprof with wget -O goroutine.out http://localhost:10254/debug/pprof/goroutine?debug=1 to view the number of goroutines blocking on the gRPC CallbackSerializer. (the total number of goroutines should not increase)
Setup process
Screenshots
Check all the applicable boxes
Related PRs
NA
Docs link
NA