-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflows list page delay in showing retry status from details page #12868
Comments
That sounds correct. The Workflows List page runs a The Workflows Details page can have similar issues as it relies on a Albeit there is a difference, due to a copy+paste issue, when a retry is started, the page refreshes, same as with resubmit (the code is copied from resubmit, but a retry does not create a new Workflow, so there isn't actually a need to change pages. and a page change should happen via the internal router as well instead of the browser URL). The retry does not necessarily happen immediately though (especially not after #12538), so it might not be caught during the refresh either. The main way to improve that would be for the UI to keep a cache of Workflows and reconcile them with the Server, that way the Details page and the List page could re-use the data for a faster update, but that would require a sizeable refactor. |
It is definitely the case that sometimes the status does not change to running at all - that happened the first time I submitted a retry on version 3.5.5 and I have seen it several times since. |
You mentioned that before, thanks for confirming it's happened multiple times. That's odd, that really shouldn't happen that an update is missed... there are some race conditions that can happen, but not frequently 🤔 Do you have a rough estimate of how often that happens? Like "1/20 retries" for example I also checked the UI merge logic which seems to be correct afaict
This part is interesting, suggesting a regression. A few things changed in 3.5 here. The options panel was added for retries in #11632, which also added the hard refresh after a retry. And most of the UI was refactored to use React hooks; in particular, I did a massive refactor of the Workflows List page in #11891 which made it substantially more efficient as there were many duplicate network calls. That PR has had a few (less than one-liner) bugs in it though. technically speaking, the back-end k8s client Informer libraries rebuild their cache on a certain timeframe as well (the "resync period" is every 20min by default), specifically because they might miss an update (cache invalidation is hard)... I wonder if we should perhaps do some similar logic here in the UI 🤔 |
I am not sure how often it happens but it did happen last week and was consistent for at least 10 minutes over several workflows, but then reverted to the behaviour of updating the status after about 5 seconds just as I was about to troubleshoot it. I do know that some of those workflow retries were stuck in a "Pending" state ( |
@amfage i see a related PR was merged, have u tested on latest version? |
Pre-requisites
:latest
What happened/what did you expect to happen?
If a workflow fails, when viewing the workflow and clicking "retry", when returning to the main workflows listing page, the workflow still shows failed, although the workflow is running. Sometimes the workflow listing page will, after a second or maybe 5 seconds, show the correct status, but sometimes it stays displaying the failed status.
The expected behaviour is that the icon shows the blue circle running icon, and the started, finished, duration and message areas show that the workflow is running, as it would if the workflow was running for the first time. This is how it behaved in the previous version we were running which was 3.4.11.
Version
v3.5.5
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: