You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe this has already been extensively discussed but the GPU test suites on Buildkite fail often, requiring manual intervention to restart them for each PR.
The obvious solution is a bigger machine for testing, but I have two suggestions that are much easier to implement:
Updating Buildkite. Newer versions may be more stable. The latest version is 3.79 but Sverdrup is on v3.24.0 (almost 4 years old) and Tartarus is on v3.50.4.
If builds are failing due to too much resource competition, reducing the number of Buildkite agents on Sverdrup may help. Right now there are 16. I wonder if GPU builds will be more stable with 8-12. Some builds may be slower but if no one has to restart a test suite then that would make for a better developer experience.
The text was updated successfully, but these errors were encountered:
We think there is a race condition in the CI. Partly discussed on #3661 and also #3662, although one conclusions is that we should update to use the buildkite plugin (started on #3042)
Maybe this has already been extensively discussed but the GPU test suites on Buildkite fail often, requiring manual intervention to restart them for each PR.
The obvious solution is a bigger machine for testing, but I have two suggestions that are much easier to implement:
The text was updated successfully, but these errors were encountered: