-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dv] Minor improvements to cip_base_vseq and rv_dm TB #21282
Conversation
This change won't really make any difference to the behaviour, but it will hopefully make it a bit clearer what's going on. The test that is being moved was added by d7ba5ec. I think the idea is that we want to fail quickly if a reset happens in the middle of a CSR operation, because it's rather painful to debug what happens when things get out of sync. This commit is moving the check a little earlier again. Now, we fail just *before* issuing the reset, and it should hopefully be easy to understand what has gone wrong if this fails. Note that it's not exactly equivalent. Before this change, the check would fail if seq_wo_reset completed before we issued a reset at all, and it accidentally left a CSR item outstanding. The new code won't notice that happen (but I don't think it's something that we're worried about seeing). Signed-off-by: Rupert Swarbrick <[email protected]>
This avoids silly failures where wait_clks happens to finish when we're in the middle of a CSR access. Signed-off-by: Rupert Swarbrick <[email protected]>
This avoids a problem with the rv_dm_tap_fsm_rand_reset sequence. That sequence runs a smoke vseq and then tries to reset it in the middle of its operation. That might not work if the reset time happens to match a big block of back-to-back CSR accesses. The change in this commit ensures that there are always occasional cycles where there isn't a CSR access going, which means the reset sequence can safely inject a reset. Signed-off-by: Rupert Swarbrick <[email protected]>
No functional change: I've just pulled the contents of the repeat blocks into separate tasks. Signed-off-by: Rupert Swarbrick <[email protected]>
No real change in what gets run, except that we no longer guarantee to run everything at least once. The shuffle isn't quite a complete shuffle, because the changes to dmactive are still done last. This is because setting check_dmactive() might disable the debug module, after which the module will ignore the writes that the other tasks expect to have an effect. One possible question is whether this shuffle has made it likely we don't run a particular check. This shouldn't be an issue: the probability of not running a particular test when we go through the randcase N times is going to be 3/4^N. This already drops below 0.3% with N >= 20. Signed-off-by: Rupert Swarbrick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to wait_to_issue_reset() seem sensible to me, but I do see there are two derived classes (hmac_common_vseq.sv/otp_ctrl_common_vseq.sv) who override the method and call super.wait_to_issue_reset(). Is it worth running a little regression locally just to check for any breakage there? I may be wrong, but I'm not sure the CI testcases would get those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too concerned: this change can only ever cause us to wait a bit of extra time. And any test that causes it to fire would have failed unconditionally without this change. So I think this is guaranteed to be safe :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough! :)
This is all driven by me trying to get an rv_dm test to (finally!) pass. Frustratingly, I'm not there yet, but I'm reasonably convinced that these changes make things better.
The general commits are all to
cip_base_vseq
. This vseq has await_to_issue_reset
task. The idea is that this task picks an arbitrary count, with a bound, and then waits that many cycles. This task is used byrun_seq_with_rand_reset_vseq
(to be the "rand reset" bit). Unfortunately, there's a bit of a problem: if the TB happens to be in the middle of a CSR access then we end up failing a check (which expectshas_outstanding_access()
to return false). This was added by @cindychip with commit 4c81f23 in Summer 2022. I think the problem is that if we reset in the middle of a CSR access then things get a bit out of whack and we end up with an infinite chain of nested accesses. Cindy's change adds an early-failure path if this happens, which is much easier to diagnose.Anyway, the first two commits in this PR just add a bit of a "fuzz factor": if we were going to try to reset at a particular time when there is an outstanding CSR access, we just wait a few cycles to see whether it clears up.
This wasn't quite enough! It turns out that the
rv_dm
smoke sequence sometimes ends up with long chains of back-to-back CSR accesses. Looking at the code, I think this is because the initial implementation (0af1201) was missing a wait at the end of one of the repeat loops. The third commit in this PR adds that wait, and then the following two commits try to tidy up the sequence a bit, to avoid some repetition.Annoyingly, this still doesn't get the darn test to work! If you run
util/dvsim/dvsim.py hw/ip/rv_dm/dv/rv_dm_sim_cfg.hjson --tool=xcelium -i rv_dm_tap_fsm_rand_reset --fixed-seed 123
with the head of this PR, it will fail! But now the problem is that the injected reset is killing the process behind a sequence that's running. Definitely something to fix, but I think it's orthogonal to what this PR does.