-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing support improvements #1819
Tracing support improvements #1819
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and seems to mostly match our other SDKs. Will let @Quinn-With-Two-Ns and otehrs review.
String runId = context.getRunId(); | ||
Preconditions.checkNotNull( | ||
runId, "runId is expected to be not null for span operation type %s", operationType); | ||
return ImmutableMap.of( | ||
StandardTagNames.WORKFLOW_ID, context.getWorkflowId(), | ||
StandardTagNames.RUN_ID, context.getRunId()); | ||
case QUERY_WORKFLOW: | ||
return ImmutableMap.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our other SDKs, query spans also have workflow ID and run ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that handleQuery
is not executed within a workflow thread, leading to non-deterministic errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ImmutableMap.of(); | |
return ImmutableMap.of( | |
StandardTagNames.WORKFLOW_ID, context.getWorkflowId(), | |
StandardTagNames.RUN_ID, context.getRunId()); |
To confirm, are you saying change this to the above errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you are not setting the workflow ID and run ID on HANDLE_QUERY
. You should set those the same way you do for HANDLE_SIGNAL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that a span without these tags is ok, but I'll let y'all decide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the options are either no span or a span without those tags. I have no preference.
Regardless of which we go with we will need to open an issue to address this when we fix getInfo
in query handlers , but I don't think that should block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concur, will let you make the call on tagless span vs no span. I don't have a super-strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be better to have a tagless span since the span itself is useful for tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me
tracer, | ||
input.getSignalName(), | ||
input.getWorkflowExecution().getWorkflowId(), | ||
References.FOLLOWS_FROM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the workflow inbound should make this a follows-from, but client outbound should be a child-of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
spanFactory | ||
.createWorkflowSignalSpan( | ||
tracer, | ||
input.getSignalName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend, to match other SDKs, that workflow inbound signal, query, and update span names be HandleSignal:<name>
, HandleQuery:<name>
, and HandleQuery:<name>
instead of reusing the same outbound names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
contextAccessor.readSpanContextFromHeader(input.getHeader(), tracer); | ||
Span workflowSignalSpan = | ||
spanFactory | ||
.createWorkflowSignalSpan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read this closely, but what many of our other SDKs do is make this span a child-of the original inbound workflow-level span and a follows-from the inbound signal-level span. Is that what's happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate more on this (a link, doc or something related to other SDKs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you view this HandleSignal
span in many SDKs, its parent is the same as the RunWorkflow
's parent (i.e. the span that came in the original start workflow header). And this HandleSignal
span is just linked via follows-from to the span in this call's header.
return createSpan(context, tracer, null, References.FOLLOWS_FROM); | ||
} | ||
|
||
public Tracer.SpanBuilder createWorkflowSignalSpan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public Tracer.SpanBuilder createWorkflowSignalSpan( | |
public Tracer.SpanBuilder createWorkflowHandleSignalSpan( |
Can you change the names for handle-based spans instead of overloads where the caller can't know they are creating different operation types just by calling different overloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
hm looks like |
@siavashkavousi looks like |
@Quinn-With-Two-Ns I'm sorry; I didn't have the time to investigate. I've taken a look at the tests now, and it seems that with our support for signal spans, we should update the assertions there. Specifically, we now expect two spans as children of the parent span. It is weird that unit tests without Docker are now failing. I'll investigate this issue further in the coming days to understand it in more detail. |
@Quinn-With-Two-Ns I have reviewed the situation, and it appears that the issue arises from the disparity between in-memory tests and Docker integration tests. To address this, I used |
@siavashkavousi That is fine by me. The in memory test server probably does not pass headers through in some situations . For CI to pass, unfortunately some of our CI requires that when you pull the repo you need to use |
@siavashkavousi some of our CI requires that when you pull the repo you need to use |
@Quinn-With-Two-Ns I apologize for the delay. I've pulled and pushed tags. Could you please verify if I've done the right thing or not, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siavashkavousi Thanks for the contribution, I'll leave open for a couple days if @cretz has anything to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (though I didn't pour over the details of the latest commits)
Improve support for tracing signals, queries, and updates.
Our team encountered issues with tracing not being supported across workflow signals, queries, and updates when using temporal. I am proposing a proof of concept to address this, and if it is accepted, I will handle the necessary improvements and tests.
Note:
I have manually tested this in our test environment, and it appears to be working. However, as mentioned earlier, I will add tests after the initial review.