-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random failures due to jumbled output in TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_1 breaking PR builds starting 2022-07-08 #10898
Comments
FYI: This was the only test failure which took out the last iteration if my PR build #10808 (comment). I have been trying to get that PR build to pass PR testing for going on 3 weeks now and random Tpetra test failures have taken out several of those iterations. |
@bartlettroscoe this is a little different from the last one, since it's not output deliberately printed by Tpetra. I've noticed it before on other projects as well, but I'm not exactly sure what the root cause is. I'm reaching out to the Kokkos team for more information on this. |
@bartlettroscoe from my conversation on the Kokkos slack, it sounds like this is actually a Kokkos bug, which was resolved in kokkos/kokkos#5151. This fix will be available in Kokkos 3.7 -- I'll leave it up to you whether it's better to wait for Kokkos 3.7 to make it into Trilinos or pull the fix over now. |
@tasmith4, I think it can wait for the Kokkos upgrade. However, it would be good to know how many Tpetra tests are failing due to jumbled output. It occurred to me how to search for that and I think this query does that which shows: So between this issue and #10885, I think that catches them all. |
FYI - Trilinos PR for Kokkos/KokkosKernels update is supposed to get put in this week (as per Nathan). |
@bartlettroscoe I think for most if not all tests we just write to the Teuchos unit test "out" stream, and a lot of that stuff gets handled however the Teuchos unit testing framework/command line options specify (I've never dug super deep into that). |
Right, but that is just a stream. Perhaps we should create a function in TriBITS called |
I could go for that. Could be a lot of work to retrofit existing tests though. |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
CC: @trilinos/tpetra, @tasmith4
Description
As shown in this query (click "Shown Matching Output" in upper right) the test:
TpetraCore_MatrixMarket_Tpetra_CrsMatrix_Dist_Binary_simple_MPI_1
is randomly failing in the builds:
PR-10706-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-380
PR-10751-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-877
PR-10808-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-929
started testing day 2022-07-08.
Just like for the Tpetra tests reported in issue #10885, these failures are caused by jumbled output breaking up the printing of
End Result: TEST PASSED
like shown here showing:Current Status on CDash
Run the above query adjusting the "Begin" and "End" dates to match today any other date range or just click "CURRENT" in the top bar to see results for the current testing day.
Steps to Reproduce
It is a randomly failing test so it will be hard to reproduce.
The text was updated successfully, but these errors were encountered: