Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Enable uart_tx_rx_test for fpga sival environments #20698

Closed
wants to merge 2 commits into from

Conversation

a-will
Copy link
Contributor

@a-will a-will commented Dec 21, 2023

They pass locally. Also, remove the cw310_test_rom environment from the list, as it doesn't support this test.

Note that this PR depends on #20696 to bring the fpga_cw310_sival_rom_ext config over to hyper310.

The cw310_sival_rom_ext configuration was still on the SAM3X bitstream
variant. Move it to hyper310 to make all the SiVal I/O available.

Signed-off-by: Alexander Williams <[email protected]>
They pass locally. Also, remove the cw310_test_rom environment from the
list, as it doesn't support this test.

Signed-off-by: Alexander Williams <[email protected]>
@a-will a-will requested a review from matutem December 21, 2023 05:59
@jwnrt
Copy link
Contributor

jwnrt commented Dec 21, 2023

This test is a little flaky still, but I haven't worked out what the problem is:

[jw opentitan]$ bazel test --runs_per_test=50 --cache_test_results=no //sw/device/tests:uart0_tx_rx_test_fpga_cw310_sival
[...]
//sw/device/tests:uart0_tx_rx_test_fpga_cw310_sival                     TIMEOUT in 1 out of 50 in 60.1s
  Stats over 50 runs: max = 60.1s, min = 1.3s, avg = 2.8s, dev = 8.6s
  /home/jw/.cache/bazel/_bazel_jw/7e172ccc9b1d5908d84fded9a2ecf9f4/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild-ST-2cc462681f62/testlogs/sw/device/tests/uart0_tx_rx_test_fpga_cw310_sival/run_27_of_50/test.log

Executed 1 out of 1 test: 1 fails locally.

The logs for the failed runs are always the same:

56:Starting test uart_tx_rx...
[2023-12-21T08:59:59.520Z  console]I00001 ottf_main.c:154] Running sw/device/tests/uart_tx_rx_test.c
[2023-12-21T08:59:59.526Z  console]I00002 uart_tx_rx_test.c:514] Test UART0 with base_addr: 40000000
[2023-12-21T08:59:59.532Z  console]I00003 clkmgr_testutils.c:88] Variability for Io 96 is 6
[2023-12-21T08:59:59.538Z  console]I00004 clkmgr_testutils.c:90] Variability for Cpu 96 is 6
[2023-12-21T08:59:59.544Z  console]I00005 clkmgr_testutils.c:157] Enabling clock count measurement for io_clk(0) lo 89 hi 101
[2023-12-21T08:59:59.553Z  console]I00006 clkmgr_testutils.c:157] Enabling clock count measurement for io_div2_clk(1) lo 43 hi 51
[2023-12-21T08:59:59.561Z  console]I00007 clkmgr_testutils.c:157] Enabling clock count measurement for io_div4_clk(2) lo 20 hi 26
[2023-12-21T08:59:59.570Z  console]I00008 clkmgr_testutils.c:157] Enabling clock count measurement for main_clk(3) lo 89 hi 101
[2023-12-21T08:59:59.578Z  console]I00009 clkmgr_testutils.c:157] Enabling clock count measurement for usb_clk(4) lo 180 hi 202
[2023-12-21T08:59:59.586Z  console]I00010 uart_tx_rx_test.c:257] Initializing the UART.
[2023-12-21T08:59:59.591Z  console]I00011 uart_tx_rx_test.c:292] Initializing the PLIC. 00000001
[2023-12-21T08:59:59.596Z  console]I00012 uart_tx_rx_test.c:412] Executing the test.
[2023-12-21T08:59:59.600Z INFO  uart_tx_rx] Sending data...
[2023-12-21T08:59:59.600Z INFO  uart_tx_rx] Reading data...
[2023-12-21T08:59:59.600Z INFO  uart_tx_rx] Sending a chunk of data larger than the FIFO...
[2023-12-21T08:59:59.601Z  console]
-- Test timed out at 2023-12-21 09:00:58 UTC --

It seems the RX overflow interrupt isn't firing sometimes. I'm still debugging, but so far adding more synchronisation and sending way more data hasn't triggered the overflow interrupt yet.

@a-will
Copy link
Contributor Author

a-will commented Dec 21, 2023

Ah, this test's code does not follow good practices for avoiding races, and it needs to stop using the SPI console (generic mode is going to be removed entirely). It will need some fixes...

@jwnrt
Copy link
Contributor

jwnrt commented Dec 21, 2023

this test's code does not follow good practices for avoiding races

Do you know what changes are needed for proper synchronisation? I couldn't find the race when I was poking around this morning. All I found was that adding artificial delay to the data-processing loop improved pass rates, and adding delay to the ISR significantly reduced them.

We used the SPI console to avoid having to use the UART for OTTF (since it will be under test), but we may be able to rework the test so that uart1 is used for the console while uart0 is under test.

generic mode is going to be removed entirely

Does generic mode mean using the SPI device raw (as opposed to something like its flash emulation mode)? We were thinking of using the flash emulation mode to implement RX (it currently only supports TX) but this would be synchronous and driven by the host which made everything more complicated and we abandoned that plan.

@a-will
Copy link
Contributor Author

a-will commented Dec 21, 2023

this test's code does not follow good practices for avoiding races

Do you know what changes are needed for proper synchronisation? I couldn't find the race when I was poking around this morning. All I found was that adding artificial delay to the data-processing loop improved pass rates, and adding delay to the ISR significantly reduced them.

For example:

if (!uart_irq_rx_watermark_fired && !uart_irq_tx_watermark_fired &&
!uart_irq_rx_overflow_fired) {
wait_for_interrupt();
}

This sequence is an improper use of wfi. The condition check for entering wfi and the call to wfi itself should not happen while interrupts are unmasked. It's not clear to me if this test could actually get wedged here, though, since the RX overflow interrupt should probably break it out while we're attempting to force it from the host side.

We used the SPI console to avoid having to use the UART for OTTF (since it will be under test), but we may be able to rework the test so that uart1 is used for the console while uart0 is under test.

generic mode is going to be removed entirely

Does generic mode mean using the SPI device raw (as opposed to something like its flash emulation mode)? We were thinking of using the flash emulation mode to implement RX (it currently only supports TX) but this would be synchronous and driven by the host which made everything more complicated and we abandoned that plan.

Yes. This mode is scheduled for deletion once the window for RTL changes opens. Also, note that every SPI mode is synchronous and driven by the host. However, generic mode allowed for simultaneous TX + RX communication ...not that hyperdebug's OCTOSPI peripheral supports it.

If we bring in a USB console (like #19249), there's possibly that option too. However, usbdev handling in CI is a lot more "exciting" with the decision to hide / rename device nodes, and opentitanlib won't necessarily be able to find the correct device on its own.

@jwnrt
Copy link
Contributor

jwnrt commented Dec 21, 2023

Thanks @a-will, that makes sense

@a-will
Copy link
Contributor Author

a-will commented Dec 21, 2023

Actually, it looks like the test completes on the device side, but the message over the SPI console does not get communicated. I tried running in the RMA state, and during an apparent failure, gdb showed the test had returned from test_main() and was reporting the test status.

Then, for the original sival conditions, I added a GPIO indicator (set pin to 0 on entry, set pin to 1 on test pass), and the GPIO also said the device thought the test passed.

Ultimately, it seems like the problem is in the SPI console domain.

@jwnrt
Copy link
Contributor

jwnrt commented Dec 21, 2023

Interesting, that makes sense since all my debugging was done through printf over the console.

The only communication between the device and testrunner are a sync point before the data is sent and the final PASS! signal.

Could we use GPIO for those to run the test? We can leave the console enabled for whatever debugging value it provides, but at least the test will pass.

Alternatively, we move to using another UART or USB as the console as you suggest.

Copy link
Contributor

@matutem matutem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the changes to hw/top_earlgrey/BUILD are already merged, so they will be wiped when you rebase.

LGTM

@a-will
Copy link
Contributor Author

a-will commented Jan 23, 2024

Superseded by #20879

Thanks @jwnrt !

@a-will a-will closed this Jan 23, 2024
@a-will a-will deleted the uart-tx-rx-fpga branch January 23, 2024 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants