-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List workflows sometimes fail with out of sort memory error #13229
Comments
3.5.8 fixed a pretty severe bug in #13166 which affects the new in-memory SQLiteDB in 3.5.7 from #13021 / #12736, so I would really suggest upgrading. Was there a stack trace in the logs? Can you provide the preceding logs and the logs after this error?
Can you provide your ConfigMap? You didn't mention an important detail here; are you using the Workflow Archive or status offloading features that require an external DB? In your case, sounds like MySQL? If it's a MySQL resource allocation issue, I'm not sure there's anything Argo can do about that; you would have to change your configuration of MySQL |
Indeed this does seem like a specific MySQL issue per this SO answer which links to a MySQL bug report which links to another and so forth. Unfortunately, if I followed the thread correctly, it seems like they closed it with a documentation update instead of fixing the regression in MySQL >= 8.0.18 😕 It seems to specifically affect sorts on tables with JSON columns, especially those >1MB, and the Workflow |
Our MySQL db version is 8.0.35, and we use workflow archive but not the node status offload feature |
8.0.35 >= 8.0.18, so you would indeed be affected by that MySQL regression. If you're not using status offloading, then you're probably under 1MB (you might still have compressed nodes, which might be uncompressed in the archive, I'm not sure). Unfortunately the MySQL issue is not limited to >1MB, it just seems to happen more often in those cases, per the linked threads. It's out of Argo's hands at this point; you have to configure your DB to have more sort memory or use one of the other workarounds in the MySQL threads and docs. |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened/what did you expect to happen?
In the workflows page, if I select the most recent 500 workflows, sometimes fail with
Searching on web is suggesting it's a MySQL issue, but not sure whether this is sth.
Unfortunately it only fails in our prod deployment, and I couldn't upgrade the version there to see whether it still have the same issue in v3.5.8.
Some example error logs I can find in the argo server log:
Version
v3.5.7
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
N/A
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: