-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jumpstart failing in CI runs #1119
Comments
jump start is failing consistently in sdk ci because of timeout and we have a 6 minute timeout |
we seem to have got it passing now all of the sudden no explanation as to why when we added are exit call (are tests wernt exiting when jump start fails) but thats unrelated 🤷 |
i want to know if it's intermittent. Jump start failing unpredictably will massively hamper our ability to work (+ approve rc's) |
@frankiebee @mixmix while I'm not ruling out that this could be a problem with the jump start itself, for this specific issue it's probably more related to the Rust test success/failure criteria.
I would try and shorten the retry time there. You shouldn't need to wait more than maybe a minute for the jumpstart to happen. |
hmmm interesting, will investigate more next week for what it is worth on the JS side to mitigate until the root cause is found you can retry after 50 blocks if it doesn't work |
we are retrying every 10 blocks watching logs it seems when jump start succeeds locally it does this within a 3 block period |
unless theirs a reason we should try for 50? i'm seeing that when it gets retried at 50 blocks it works |
50 blocks is the default time we allow before a retry can happen |
Got it.
this may be true on are local machines are ci more often then not is hitting the 6 minute mark
|
mmmmm this is interesting with all the evidence here Im leaning towards the CI nodes not being able to keep up with the blocks
This is actually a lot of evidence, Ill try to think of ways to test this and talk to Vi about upping the CI machine monday could possibly be something in our codebase but leaning to less likely due to the overwhelming evidence |
Looks like the jumpstart is failing periodically for some of our tests. For example, see this CI run.
This is probably because these tests rely on timeouts. It would be nice to get this behaviour to be more consistent.
The text was updated successfully, but these errors were encountered: