Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(otelgin): add support for recording panics #5090

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

asecondo
Copy link

Implements as described in #5088

@asecondo asecondo requested a review from a team February 14, 2024 03:44
@@ -86,7 +86,18 @@ func Middleware(service string, opts ...Option) gin.HandlerFunc {
opts = append(opts, oteltrace.WithAttributes(rAttr))
}
ctx, span := tracer.Start(ctx, spanName, opts...)
defer span.End()
defer func() {
if cfg.RecordPanicConfig.enabled {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: added this as an opt-in behavior to preserve existing behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this option. This is new behavior, not a breaking change.

CHANGELOG.md Outdated
@@ -15,6 +15,7 @@ The next release will require at least [Go 1.21].

- Add the new `go.opentelemetry.io/contrib/instrgen` package to provide auto-generated source code instrumentation. (#3068, #3108)
- Support [Go 1.22]. (#5082)
- Add support to record panics in `go.opentelemetry.io.contrib/instrumentation/github.com/gin-gonic/gin/otelgin` (#5090)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add support to record panics in `go.opentelemetry.io.contrib/instrumentation/github.com/gin-gonic/gin/otelgin` (#5090)
- Add support to record panics in `go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin`. (#5090)

@@ -86,7 +86,18 @@ func Middleware(service string, opts ...Option) gin.HandlerFunc {
opts = append(opts, oteltrace.WithAttributes(rAttr))
}
ctx, span := tracer.Start(ctx, spanName, opts...)
defer span.End()
defer func() {
if cfg.RecordPanicConfig.enabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this option. This is new behavior, not a breaking change.

if cfg.RecordPanicConfig.enabled {
if r := recover(); r != nil {
err := fmt.Errorf("error handling request: %s", r)
span.RecordError(err, oteltrace.WithStackTrace(cfg.RecordPanicConfig.stackTrace))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, the config is probably superfluous.

defer func() {
if cfg.RecordPanicConfig.enabled {
if r := recover(); r != nil {
err := fmt.Errorf("error handling request: %s", r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the added value of creating a new error here over recording the original one?

If we do want to create a new error, we should wrap the original one:

Suggested change
err := fmt.Errorf("error handling request: %s", r)
err := fmt.Errorf("error handling request: %w", r)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll update to record the original error

if r := recover(); r != nil {
err := fmt.Errorf("error handling request: %s", r)
span.RecordError(err, oteltrace.WithStackTrace(cfg.RecordPanicConfig.stackTrace))
span.SetStatus(codes.Error, "panic recovered")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
span.SetStatus(codes.Error, "panic recovered")
span.SetStatus(codes.Error, err.Error())

@@ -88,3 +129,10 @@ func WithSpanNameFormatter(f func(r *http.Request) string) Option {
c.SpanNameFormatter = f
})
}

// WithRecordPanicConfig specifies a configuration for panic recording.
func WithRecordPanicConfig(cfg RecordPanicConfig) Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do want to keep this config, how about:

Suggested change
func WithRecordPanicConfig(cfg RecordPanicConfig) Option {
func WithRecordPanicConfig(enabled, stacktrace bool) Option {

Then, there's no need to have a config within a config.

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.1%. Comparing base (a111e3e) to head (bb56d55).

❗ Current head bb56d55 differs from pull request most recent head 0b7fca7. Consider uploading reports for the commit 0b7fca7 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #5090     +/-   ##
=======================================
- Coverage   61.2%   61.1%   -0.2%     
=======================================
  Files        185     185             
  Lines      11207   13786   +2579     
=======================================
+ Hits        6865    8428   +1563     
- Misses      4142    5162   +1020     
+ Partials     200     196      -4     
Files Coverage Δ
...ation/github.com/gin-gonic/gin/otelgin/gintrace.go 84.2% <100.0%> (+3.9%) ⬆️
...ntation/github.com/gin-gonic/gin/otelgin/option.go 100.0% <100.0%> (ø)

... and 183 files with indirect coverage changes

@@ -86,7 +86,18 @@ func Middleware(service string, opts ...Option) gin.HandlerFunc {
opts = append(opts, oteltrace.WithAttributes(rAttr))
}
ctx, span := tracer.Start(ctx, spanName, opts...)
defer span.End()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is already in the default SDK's End method:

https://github.com/open-telemetry/opentelemetry-go/blob/e8973b75b230246545cdae072a548c83877cba09/sdk/trace/span.go#L403-L420

Why is this being added here? It seems like an SDK concern not an instrumentation one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a really good question. the general pattern/setup i typically deal with have a middleware pattern that look something like this:

  1. panic recovery -pretty simple and attempts to recover any panics, log them, and sets the response code to 500. this is always at the top to catch any and everything that it can.
  2. otelgin (this middleware)
  3. auth and other middlewares

panics show up in logs via the panic recovery middleware described above and also show up in the spans via the mechanism you linked. thinking about this more, the gap i currently have is less that panics get added to this specific span somehow, and more that the spans generated by the otelgin middleware don't have a span status as error and don't set the http.status_code to anything. when an alert fires for an increase in 5XX error codes, i usually first go to Jaeger and type something like http.status_code=500 or error=true. in the cases of panics, those types of queries don't work.

given that, i think this would leave a few options:

  1. don't update this middleware. update the middleware ordering i currently use to be something like panic recover -> otelgin -> panic recover. this would result in the net result i want, but using the same middleware twice doesn't feel great.
  2. update this middleware roughly as it's currently written (still need to apply feedback with not having configs within configs, etc). this would result in the http.status_code field not being guaranteed, but that's not necessarily the responsibility of this middleware to set that. the span status will still be set to error, which is a strong signal.
  3. update this middleware roughly as written, but also give the user the ability to set a status code if a panic is caught. this feels like adding too much responsibility to this middleware.

i'm leaning towards 2, but i'd love to get other opinions as well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the latest semantic conventions, the error.type attribute should be used here to capture the panic information.

It is a known thing that our HTTP instrumentation is quite behind on semantic conventions, but I think this should be our target. Instead of having configuration to enable this, we should add the error.type attribute if a panic occurred always.

This will require this instrumentation to go through the process of upgrading semconv which will require a migration path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late reply - getting back to this now. i think that makes sense. i'm not familiar with these migrations, but i'll happily give it a shot. @MrAlias i may poke you for help with the migrations if i run into issues or have questions. i'll apply these changes, as well as the suggestions from @dmathieu.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it looks like #5092 will take care of the migration plan. i might wait until that PR is merged and then make the changes to this package on top of that. how does that sound?

@asecondo asecondo force-pushed the 5088_otelgin-catch-record-panics branch from 1213805 to 0b7fca7 Compare March 9, 2024 01:29
@mxpetit
Copy link

mxpetit commented Apr 15, 2024

Hello,
I'm also using otelgin and would very much like this issue to be merged. Can we have a little update on this?
Thanks a lot!

@asecondo
Copy link
Author

Hello, I'm also using otelgin and would very much like this issue to be merged. Can we have a little update on this? Thanks a lot!

hey @mxpetit - from the discussion in this thread, my current plan is to wait for #5092 to complete, likely make a few changes on top of that, then look to get this merged.

@flc1125
Copy link
Contributor

flc1125 commented Nov 26, 2024

I think the handling of this PR should refer to the method of this PR: #6366

@pellared
Copy link
Member

pellared commented Nov 26, 2024

I added a comment here: #5088 (comment)

@asecondo, I am changing this PR to draft as it is not in a mergeable state there are no approvals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants