Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Workflow lifecycle logging is misleading #1537

Open
mjameswh opened this issue Sep 27, 2024 · 2 comments
Open

[Bug] Workflow lifecycle logging is misleading #1537

mjameswh opened this issue Sep 27, 2024 · 2 comments
Labels
bug Something isn't working visibility

Comments

@mjameswh
Copy link
Contributor

Describe what you are trying to do

(From a user) We are trying to get good logs to do alerts on following usecases:

  • When a workflow task fails for more than 3 times (possibly becasue of implementation issue)
  • Workflow fails (because of ApplicationFailure or ActivityFailure etc)

Describe the bug

  • On Workflow Task failure, the lifecycle logger prints out a message indicating that Workflow failed; that's exactly the same error message as on actual Workflow Failure, making it impossible to differentiate these cases.

  • Similarly, one may see Workflow started printed multiple time for a same Workflow Execution, i.e. every single time that the a Worker needs to rebuild (aka “replay”) the runtime state of that Workflow Execution from the very beginning.

Additional context

  • The thing is that this lifecycle handler is logging things from the perspective of "the Cached Workflow Instance” (i.e. the specific instance of that workflow execution in the cache of that specific Workflow Worker), rather than from the perspective of the actual Workflow Execution’s lifecycle.

  • We need to think of a more precise way of formulating those messages. For various reasons, no mention of “Workflow” or “Workflow Task” (starting, failing, completing…) would be 100% reliable at that precise place. For example, Workflow code may attempt to “Complete Workflow”, but the completion command times out or get rejected by the server because of new incoming events, and so what appears to be “Workflow completed” actually ends up being a Workflow Task Failure or Timeout.

  • Community Slack conversation: https://temporalio.slack.com/archives/C01DKSMU94L/p1727436127246899

@mjameswh mjameswh added bug Something isn't working visibility labels Sep 27, 2024
@ilijaNL
Copy link

ilijaNL commented Oct 3, 2024

Since temporal sdk has metric counters, why wouldnt it be possible to do logging in the same place when the counter increases?

@mjameswh
Copy link
Contributor Author

mjameswh commented Nov 6, 2024

Since temporal sdk has metric counters, why wouldnt it be possible to do logging in the same place when the counter increases?

It's unfortunately not that simple. Metrics bookkeeping is handled by Core SDK, whereas logging is handled on the TypeScript side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working visibility
Projects
None yet
Development

No branches or pull requests

2 participants