Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dv] Minor improvements to cip_base_vseq and rv_dm TB #21282

Merged
merged 5 commits into from
Feb 13, 2024

Conversation

rswarbrick
Copy link
Contributor

@rswarbrick rswarbrick commented Feb 9, 2024

This is all driven by me trying to get an rv_dm test to (finally!) pass. Frustratingly, I'm not there yet, but I'm reasonably convinced that these changes make things better.

The general commits are all to cip_base_vseq. This vseq has a wait_to_issue_reset task. The idea is that this task picks an arbitrary count, with a bound, and then waits that many cycles. This task is used by run_seq_with_rand_reset_vseq (to be the "rand reset" bit). Unfortunately, there's a bit of a problem: if the TB happens to be in the middle of a CSR access then we end up failing a check (which expects has_outstanding_access() to return false). This was added by @cindychip with commit 4c81f23 in Summer 2022. I think the problem is that if we reset in the middle of a CSR access then things get a bit out of whack and we end up with an infinite chain of nested accesses. Cindy's change adds an early-failure path if this happens, which is much easier to diagnose.

Anyway, the first two commits in this PR just add a bit of a "fuzz factor": if we were going to try to reset at a particular time when there is an outstanding CSR access, we just wait a few cycles to see whether it clears up.

This wasn't quite enough! It turns out that the rv_dm smoke sequence sometimes ends up with long chains of back-to-back CSR accesses. Looking at the code, I think this is because the initial implementation (0af1201) was missing a wait at the end of one of the repeat loops. The third commit in this PR adds that wait, and then the following two commits try to tidy up the sequence a bit, to avoid some repetition.

Annoyingly, this still doesn't get the darn test to work! If you run util/dvsim/dvsim.py hw/ip/rv_dm/dv/rv_dm_sim_cfg.hjson --tool=xcelium -i rv_dm_tap_fsm_rand_reset --fixed-seed 123 with the head of this PR, it will fail! But now the problem is that the injected reset is killing the process behind a sequence that's running. Definitely something to fix, but I think it's orthogonal to what this PR does.

This change won't really make any difference to the behaviour, but it
will hopefully make it a bit clearer what's going on. The test that is
being moved was added by d7ba5ec. I think the idea is that we want to
fail quickly if a reset happens in the middle of a CSR operation,
because it's rather painful to debug what happens when things get out
of sync.

This commit is moving the check a little earlier again. Now, we fail
just *before* issuing the reset, and it should hopefully be easy to
understand what has gone wrong if this fails.

Note that it's not exactly equivalent. Before this change, the check
would fail if seq_wo_reset completed before we issued a reset at all,
and it accidentally left a CSR item outstanding. The new code won't
notice that happen (but I don't think it's something that we're
worried about seeing).

Signed-off-by: Rupert Swarbrick <[email protected]>
This avoids silly failures where wait_clks happens to finish when
we're in the middle of a CSR access.

Signed-off-by: Rupert Swarbrick <[email protected]>
This avoids a problem with the rv_dm_tap_fsm_rand_reset sequence. That
sequence runs a smoke vseq and then tries to reset it in the middle of
its operation. That might not work if the reset time happens to match
a big block of back-to-back CSR accesses.

The change in this commit ensures that there are always occasional
cycles where there isn't a CSR access going, which means the reset
sequence can safely inject a reset.

Signed-off-by: Rupert Swarbrick <[email protected]>
No functional change: I've just pulled the contents of the repeat
blocks into separate tasks.

Signed-off-by: Rupert Swarbrick <[email protected]>
No real change in what gets run, except that we no longer guarantee to
run everything at least once.

The shuffle isn't quite a complete shuffle, because the changes to
dmactive are still done last. This is because setting check_dmactive()
might disable the debug module, after which the module will ignore the
writes that the other tasks expect to have an effect.

One possible question is whether this shuffle has made it likely we
don't run a particular check. This shouldn't be an issue: the
probability of not running a particular test when we go through the
randcase N times is going to be 3/4^N. This already drops below 0.3%
with N >= 20.

Signed-off-by: Rupert Swarbrick <[email protected]>
@rswarbrick rswarbrick added Component:DV DV issue: testbench, test case, etc. IP:rv_dm labels Feb 9, 2024
@rswarbrick rswarbrick requested a review from a team as a code owner February 9, 2024 17:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to wait_to_issue_reset() seem sensible to me, but I do see there are two derived classes (hmac_common_vseq.sv/otp_ctrl_common_vseq.sv) who override the method and call super.wait_to_issue_reset(). Is it worth running a little regression locally just to check for any breakage there? I may be wrong, but I'm not sure the CI testcases would get those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too concerned: this change can only ever cause us to wait a bit of extra time. And any test that causes it to fire would have failed unconditionally without this change. So I think this is guaranteed to be safe :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! :)

@rswarbrick rswarbrick merged commit b2d81a3 into lowRISC:master Feb 13, 2024
34 checks passed
@rswarbrick rswarbrick deleted the rv-dm-waits branch February 13, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:DV DV issue: testbench, test case, etc. IP:rv_dm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants