-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failing tests in ESPResSo v4.2.1 due to timeouts #363
Comments
As mentioned in #331 (comment), it seems that hitting hanging tests is more likely on @jngrad Does this happen to ring any bells for you? Are you seeing hanging tests more often on certain platforms? |
I've added a hook to #331 ignore the failing tests in ESPResSo v4.2.1 if they occur, that's the best we can do for now (other than not running the test suite at all, which is not a good idea imho), and updated the list of known issues to include this tracker issue, so we can get ESPResSo deployed in EESSI 2023.06... |
From my experience on Fedora Koji, our test cases aren't more prone to failure on neoverse compared to x86_64. However on architectures other than ARM and x86_64, we do see a lot of variability. For example when packaging on openSUSE, we ended up disabling every architecture but x86_64. See openSUSE:Factory/python3-espressomd and click on "Show 17 excluded/disabled results" to see the list. |
Here are all statistical tests: They are known to take a large amount of time on our CI pipelines, because we run them concurrently and max out the host machine CPU resource usage via MPI oversubscription, so that hyperthreaded cores are fully used. This makes their runtime fluctuate wildly, with a negative feedback loop since they compete against one another for the same resources (e.g. if one test times out, there is a very good chance another unrelated test will time out too). More details can be found in espressomd/espresso#3883. Having said that, you don't seem to run these tests concurrently, so your CI pipelines should not be experiencing the issue I just described. Maybe there is a deeper issue in ESPResSo's MPI code, unfortunately timeout information alone is not sufficient for me to investigate an MPI issue. |
Interestingly, this problem did not pop up for the installation of ESPResSo v4.2.1 with |
Scratch that, that's incorrect. We have a hook in place to ignore failing tests on
|
use NESSI_SITE_INSTALL when it is set
Some tests in the ESPResSo v4.2.1 test suite are known to be flaky, and sometimes hang, for example:
We ran into similar problem when building ESPResSo v4.2.1 for EESSI pilot 2023.06, cfr. #331 .
The text was updated successfully, but these errors were encountered: