Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing support improvements #1819

Merged

Conversation

siavashkavousi
Copy link
Contributor

Improve support for tracing signals, queries, and updates.

Our team encountered issues with tracing not being supported across workflow signals, queries, and updates when using temporal. I am proposing a proof of concept to address this, and if it is accepted, I will handle the necessary improvements and tests.

Note:
I have manually tested this in our test environment, and it appears to be working. However, as mentioned earlier, I will add tests after the initial review.

@siavashkavousi siavashkavousi requested a review from a team as a code owner July 16, 2023 20:38
@CLAassistant
Copy link

CLAassistant commented Jul 16, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and seems to mostly match our other SDKs. Will let @Quinn-With-Two-Ns and otehrs review.

String runId = context.getRunId();
Preconditions.checkNotNull(
runId, "runId is expected to be not null for span operation type %s", operationType);
return ImmutableMap.of(
StandardTagNames.WORKFLOW_ID, context.getWorkflowId(),
StandardTagNames.RUN_ID, context.getRunId());
case QUERY_WORKFLOW:
return ImmutableMap.of();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our other SDKs, query spans also have workflow ID and run ID.

Copy link
Contributor Author

@siavashkavousi siavashkavousi Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that handleQuery is not executed within a workflow thread, leading to non-deterministic errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ImmutableMap.of();
return ImmutableMap.of(
StandardTagNames.WORKFLOW_ID, context.getWorkflowId(),
StandardTagNames.RUN_ID, context.getRunId());

To confirm, are you saying change this to the above errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you are not setting the workflow ID and run ID on HANDLE_QUERY. You should set those the same way you do for HANDLE_SIGNAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that a span without these tags is ok, but I'll let y'all decide

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the options are either no span or a span without those tags. I have no preference.

Regardless of which we go with we will need to open an issue to address this when we fix getInfo in query handlers , but I don't think that should block this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concur, will let you make the call on tagless span vs no span. I don't have a super-strong opinion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be better to have a tagless span since the span itself is useful for tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

tracer,
input.getSignalName(),
input.getWorkflowExecution().getWorkflowId(),
References.FOLLOWS_FROM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the workflow inbound should make this a follows-from, but client outbound should be a child-of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

spanFactory
.createWorkflowSignalSpan(
tracer,
input.getSignalName(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend, to match other SDKs, that workflow inbound signal, query, and update span names be HandleSignal:<name>, HandleQuery:<name>, and HandleQuery:<name> instead of reusing the same outbound names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

contextAccessor.readSpanContextFromHeader(input.getHeader(), tracer);
Span workflowSignalSpan =
spanFactory
.createWorkflowSignalSpan(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read this closely, but what many of our other SDKs do is make this span a child-of the original inbound workflow-level span and a follows-from the inbound signal-level span. Is that what's happening here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate more on this (a link, doc or something related to other SDKs)?

Copy link
Member

@cretz cretz Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you view this HandleSignal span in many SDKs, its parent is the same as the RunWorkflow's parent (i.e. the span that came in the original start workflow header). And this HandleSignal span is just linked via follows-from to the span in this call's header.

return createSpan(context, tracer, null, References.FOLLOWS_FROM);
}

public Tracer.SpanBuilder createWorkflowSignalSpan(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Tracer.SpanBuilder createWorkflowSignalSpan(
public Tracer.SpanBuilder createWorkflowHandleSignalSpan(

Can you change the names for handle-based spans instead of overloads where the caller can't know they are creating different operation types just by calling different overloads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Quinn-With-Two-Ns
Copy link
Contributor

hm looks like opentracing.SignalWithStartTest test if failing, does the test need to be updated?

@Quinn-With-Two-Ns
Copy link
Contributor

@siavashkavousi looks like opentracing.SignalWithStartTest::signalWithStart is still failing in CI, do you understand why?

@siavashkavousi
Copy link
Contributor Author

siavashkavousi commented Sep 1, 2023

@siavashkavousi looks like opentracing.SignalWithStartTest::signalWithStart is still failing in CI, do you understand why?

@Quinn-With-Two-Ns I'm sorry; I didn't have the time to investigate. I've taken a look at the tests now, and it seems that with our support for signal spans, we should update the assertions there. Specifically, we now expect two spans as children of the parent span. It is weird that unit tests without Docker are now failing. I'll investigate this issue further in the coming days to understand it in more detail.

@siavashkavousi
Copy link
Contributor Author

@Quinn-With-Two-Ns I have reviewed the situation, and it appears that the issue arises from the disparity between in-memory tests and Docker integration tests. To address this, I used SDKTestWorkflowRule.useExternalService to accommodate distinct assertions for these two environments, thus preventing the need for further modifications. I propose that we address the in-memory test environment bug in a separate pull request. what do you suggest?

@Quinn-With-Two-Ns
Copy link
Contributor

@siavashkavousi That is fine by me. The in memory test server probably does not pass headers through in some situations . For CI to pass, unfortunately some of our CI requires that when you pull the repo you need to use --tags can you update your branch? Sorry about the inconvenience.

@Quinn-With-Two-Ns
Copy link
Contributor

@siavashkavousi some of our CI requires that when you pull the repo you need to use --tags can you update your branch so we CI can pass and we can merge your contribution?

@siavashkavousi
Copy link
Contributor Author

@Quinn-With-Two-Ns I apologize for the delay. I've pulled and pushed tags. Could you please verify if I've done the right thing or not, please?

Copy link
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siavashkavousi Thanks for the contribution, I'll leave open for a couple days if @cretz has anything to add.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (though I didn't pour over the details of the latest commits)

@Quinn-With-Two-Ns Quinn-With-Two-Ns merged commit d50d274 into temporalio:master Sep 25, 2023
4 checks passed
@siavashkavousi siavashkavousi deleted the tracing-support-improvements branch September 30, 2023 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants