Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(TLB): avoid freeze when GPF occurs #3964

Merged
merged 1 commit into from
Dec 2, 2024
Merged

Conversation

cebarobot
Copy link
Member

L1TLB does not store gpaddr, but gpaddr is needed when a guest page fault occurs. In that situation, L1TLB needs to send a PTW request to get the gpaddr, which we called it getGpa. The getGpa mechanism could only handle one GPF TLB request (the first one) and expects the corresponding TLB entry is still in L1TLB.

L1TLB replacement uses PLRU (Pseudo-LRU) algorithm, which may replace intems that are not necessarily the least recently used. We found an case that L1TLB replace that GPF TLB entry, although that GPF TLB entry is accessed recently. This results in a deadlock in getGpa mechanism, which eventually causes the entire core to freeze. To solve this problem, we decides to prevent any unrelated ptw refills when getGpa mechanism is working (need_gpa).

After solve such problem, we identified that under certain conditions, as other PTW response is never refilled, other TLB requests keep replying which trigger PTW requests and occupy the L2TLB request path, preventing the GPF PTW request from responding, ultimately leading to a processor freeze. To resolve this, we decides to prevent any unrelated ptw request when need_gpa.

This patch also changes the code style of some combinational logic signals. Using when/otherwise is clearer and easier to understand than complex logical expression.

L1TLB does not store gpaddr, but gpaddr is needed when a guest page fault occurs. In that situation, L1TLB needs to send a PTW request to get the gpaddr, which we called it getGpa. The getGpa mechanism could only handle one GPF TLB request (the first one) and expects the corresponding TLB entry is still in L1TLB.

L1TLB replacement uses PLRU (Pseudo-LRU) algorithm, which may replace intems that are not necessarily the least recently used. We found an case that L1TLB replace that GPF TLB entry, although that GPF TLB entry is accessed recently. This results in a deadlock in getGpa mechanism, which eventually causes the entire core to freeze. To solve this problem, we decides to prevent any unrelated ptw refills when getGpa mechanism is working (need_gpa).

After solve such problem, we identified that under certain conditions, as other PTW response is never refilled, other TLB requests keep replying which trigger PTW requests and occupy the L2TLB request path, preventing the GPF PTW request from responding, ultimately leading to a processor freeze. To resolve this, we decides to prevent any unrelated ptw request when need_gpa.

This patch also changes the code style of some combinational logic signals. Using when/otherwise is clearer and easier to understand than complex logical expression.
@XiangShanRobot
Copy link

[Generated by IPC robot]
commit: cba29ac

commit astar copy_and_run coremark gcc gromacs lbm linux mcf microbench milc namd povray wrf xalancbmk
cba29ac 1.902 0.450 2.686 1.222 2.832 2.461 2.393 0.919 1.407 1.992 3.435 2.709 2.383 3.264

master branch:

commit astar copy_and_run coremark gcc gromacs lbm linux mcf microbench milc namd povray wrf xalancbmk
7071df6 1.902 0.450 2.687 1.237 2.832 2.461 2.393 0.919 1.407 1.992 3.435 2.709 2.383 3.264
415fcbe 1.917 0.450 2.701 1.224 2.840 2.464 2.398 0.921 1.430 2.069 3.437 2.716 2.387 3.261
85a4c8e 1.917 0.450 2.701 1.230 2.840 2.464 2.398 0.921 1.430 2.069 3.437 2.716 2.387 3.261
d7b0ad9 1.917 0.450 2.701 1.227 2.840 2.464 2.398 0.921 1.430 2.069 3.437 2.716 2.387 3.261
6cd53fd

@Tang-Haojin Tang-Haojin merged commit 4fc3a30 into master Dec 2, 2024
9 checks passed
@Tang-Haojin Tang-Haojin deleted the fix-gpf-freeze branch December 2, 2024 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants