Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.5.7 No more result from API when workflows.argoproj.io/phase=Running #13143

Closed
3 of 4 tasks
hmoulart opened this issue Jun 5, 2024 · 11 comments
Closed
3 of 4 tasks
Labels
area/api Argo Server API P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@hmoulart
Copy link

hmoulart commented Jun 5, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Hello,
We have likely identified a regression in version 3.5.7. This is unexpected, as there are no related issues or changes noted in the changelog. The Argo Workflows API does not return any results when queried with workflows.argoproj.io/phase=Running, workflows.argoproj.io/phase=Pending. Failed, Succeeded, Error are OK.

GET https://[URL]/api/v1/workflows/[NAMESPACE]?listOptions.labelSelector=workflows.argoproj.io/phase=Running&listOptions.limit=10

Version 3.5.7:
image

Version 3.5.6:
image

Thank you for your help on that issue
Hugo

Version

3.5.7

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

GET https://[URL]/api/v1/workflows/[NAMESPACE]?listOptions.labelSelector=workflows.argoproj.io/phase=Running&listOptions.limit=10

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@hmoulart hmoulart added type/bug type/regression Regression from previous behavior (a specific type of bug) labels Jun 5, 2024
@Joibel Joibel added the area/api Argo Server API label Jun 5, 2024
@Joibel
Copy link
Member

Joibel commented Jun 5, 2024

I imagine this is due to #13021, I can't see another likely source of the problem.

@agilgur5 agilgur5 changed the title No more result from Argo Workflows 3.5.7 API when workflows.argoproj.io/phase=Running 3.5.7 No more result from API when workflows.argoproj.io/phase=Running Jun 5, 2024
@agilgur5 agilgur5 added the P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important label Jun 5, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Jun 5, 2024

Yes it would be from there, cc @jiachengxu. That would suggest the archived merging logic has a bug in it -- archived workflows will never be in Pending state so it should only retrieve the live ones in this case 🤔 EDIT: or perhaps more likely that the reflector isn't catching all live Workflows? i.e. that the SQLite DB is erroneously empty? Orrr that the label table specifically is off?

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Jun 5, 2024
@jiachengxu
Copy link
Member

I tried to reproduce the issue locally with v3.5.7, but I couldn't reproduce the issue, the followings are screenshots from my test, and both UI and postman returned the running workflows.
image
image

@Joibel IIRC, you are also running v3.5.7 in some of your environments, have you seen this issue before?

@Joibel
Copy link
Member

Joibel commented Jun 6, 2024

Thanks for trying to reproduce it @jiachengxu. We are running 3.5.7 in all our environments but we haven't seen it. We don't use the API all that much, we're generally Watch/Listing the actual workflows for most of our automation, and the stock open source UI is behaving fine.

@hmoulart
Copy link
Author

hmoulart commented Jun 6, 2024

Probably important information, we are using persistance with PostgreSQL database. The behaviour could be different on that case?

@jiachengxu
Copy link
Member

jiachengxu commented Jun 6, 2024

Probably important information, we are using persistance with PostgreSQL database. The behaviour could be different on that case?

@hmoulart Thanks for providing the info, during my test, I was also using the postgres database for archiving workflows.

@agilgur5
Copy link
Contributor

agilgur5 commented Jun 6, 2024

Thanks for trying to repro Alan and Jiacheng!

@hmoulart I imagine there must be some other confounding variable here, perhaps in your Argo configuration, if it hasn't been easily reproduced

@agilgur5 agilgur5 added the problem/more information needed Not enough information has been provide to diagnose this issue. label Jun 11, 2024
@tooptoop4
Copy link
Contributor

@hmoulart have u tried v3.5.8?

@hmoulart
Copy link
Author

@tooptoop4 I'm going to try tomorrow :) Thanks

@hmoulart
Copy link
Author

Hello,
I upgraded to version 3.5.8, and I’m pleased to report that the bug has been resolved!
Thank you to suggest me this upgrade.
Hugo

@agilgur5 agilgur5 removed the problem/more information needed Not enough information has been provide to diagnose this issue. label Jun 21, 2024
@agilgur5
Copy link
Contributor

For reference, that means this was resolved by #13166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Argo Server API P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

No branches or pull requests

5 participants