You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally reported in the forum thread. If the workflow task fails, the failure reason remains stuck/frozen with the original failure message even if it's now failing due to a different reason (because of a new deployment for eg) now making it a bit hard to debug and giving a wrong impression that the new deployment didn't go through or the workflow updated hasn't happened etc.
To Reproduce
In pseudo code, here's the first deployment with which the workflow is run:
So obviously, this is going to finish activity-1 and result in WorkflowTaskFailed with reason being that exception foo was raised. The workflow task will keep getting retried.
and redploy the worker. From the worker logs I can see that it’s now raising the new exception bar, but the UI doesn’t update the status of event history, it remains frozen at WorkflowTaskFailed with the reason that exception foo occurred which is no longer accurate. This is just an example but it makes troubleshooting a bit difficult by looking things up in the UI. It’s as-if the worker was running stale code and wasn’t updated.
Then even introduce non-determinism by changing the code to the following and redeploy the (only) worker:
Again from the worker logs, I can see it quits execution as soon as it sees divergence (expects activity-1 to be completed by looking at event history but finds activity-3 in its place in new code). So it immediately detects non-determinism and errors out (and then will be retried as usual and so on) but the UI for the workflow remains frozen with just the original error that workflow-task failed due to exception foo.
If you now revert back to the 1st snippet and get rid of the exception, the worker now successfully completes the workflow (on the next retry), finishing activity-2 too now, and the UI updates with all that and finally shows the workflow as completed.
However in the meantime, due to lack of updates it makes troubleshooting a bit difficult.
Expected behavior
On retires, if the failure reasons for the workflow-tasks have changed, the UI should update to show the new reasons (like it does for activity-tasks in pending activity)
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
OS: MacOS
Browser: Brave, Firefox etc
Additional context
Tried this both, in the "old" UI and the "new" UI toggled with the "Labs On" button on the bottom left.
The text was updated successfully, but these errors were encountered:
Describe the bug
Originally reported in the forum thread. If the workflow task fails, the failure reason remains stuck/frozen with the original failure message even if it's now failing due to a different reason (because of a new deployment for eg) now making it a bit hard to debug and giving a wrong impression that the new deployment didn't go through or the workflow updated hasn't happened etc.
To Reproduce
In pseudo code, here's the first deployment with which the workflow is run:
So obviously, this is going to finish activity-1 and result in
WorkflowTaskFailed
with reason being that exceptionfoo
was raised. The workflow task will keep getting retried.Now change code to:
and redploy the worker. From the worker logs I can see that it’s now raising the new exception
bar
, but the UI doesn’t update the status of event history, it remains frozen atWorkflowTaskFailed
with the reason that exceptionfoo
occurred which is no longer accurate. This is just an example but it makes troubleshooting a bit difficult by looking things up in the UI. It’s as-if the worker was running stale code and wasn’t updated.Then even introduce non-determinism by changing the code to the following and redeploy the (only) worker:
Again from the worker logs, I can see it quits execution as soon as it sees divergence (expects
activity-1
to be completed by looking at event history but findsactivity-3
in its place in new code). So it immediately detects non-determinism and errors out (and then will be retried as usual and so on) but the UI for the workflow remains frozen with just the original error that workflow-task failed due to exceptionfoo
.If you now revert back to the 1st snippet and get rid of the exception, the worker now successfully completes the workflow (on the next retry), finishing
activity-2
too now, and the UI updates with all that and finally shows the workflow as completed.However in the meantime, due to lack of updates it makes troubleshooting a bit difficult.
Expected behavior
On retires, if the failure reasons for the workflow-tasks have changed, the UI should update to show the new reasons (like it does for activity-tasks in pending activity)
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Tried this both, in the "old" UI and the "new" UI toggled with the "Labs On" button on the bottom left.
The text was updated successfully, but these errors were encountered: