Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(st-ld forward): modify misaligned forward fault detection #4038

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Anzooooo
Copy link
Member

@Anzooooo Anzooooo commented Dec 13, 2024

Previously, I used an inappropriate way for another misalign to trigger a forward fault:

//TODO If the previous store appears out of alignment, then simply FF, this is a very unreasonable way to do it.
//TODO But for the time being, this is the way to ensure correctness. Such a suitable opportunity to support unaligned forward.
val unalignedMask1 = unaligned.asUInt & forwardMask1.asUInt & allocated.asUInt
val unalignedMask2 = unaligned.asUInt & forwardMask2.asUInt & allocated.asUInt
val forwardPreWithUnaligned = (unalignedMask1 | unalignedMask2).asUInt.orR
// make chisel happy
val dataInvalidMask1Reg = Wire(UInt(StoreQueueSize.W))
dataInvalidMask1Reg := RegNext(dataInvalidMask1)
// make chisel happy
val dataInvalidMask2Reg = Wire(UInt(StoreQueueSize.W))
dataInvalidMask2Reg := RegNext(dataInvalidMask2)
val dataInvalidMaskReg = dataInvalidMask1Reg | dataInvalidMask2Reg
// If SSID match, address not ready, mark it as addrInvalid
// load_s2: generate addrInvalid
val addrInvalidMask1 = (~addrValidVec.asUInt & storeSetHitVec.asUInt & forwardMask1.asUInt)
val addrInvalidMask2 = (~addrValidVec.asUInt & storeSetHitVec.asUInt & forwardMask2.asUInt)
// make chisel happy
val addrInvalidMask1Reg = Wire(UInt(StoreQueueSize.W))
addrInvalidMask1Reg := RegNext(addrInvalidMask1)
// make chisel happy
val addrInvalidMask2Reg = Wire(UInt(StoreQueueSize.W))
addrInvalidMask2Reg := RegNext(addrInvalidMask2)
val addrInvalidMaskReg = addrInvalidMask1Reg | addrInvalidMask2Reg
// load_s2
io.forward(i).dataInvalid := RegNext(io.forward(i).dataInvalidFast) || RegNext(forwardPreWithUnaligned)

This would cause the BlockSqIdx passed to LoadQueueReplay to use the sqIdx from uop instead of the sqIdx with the unalign flag bit:

when (dataInvalidFlag) {
io.forward(i).dataInvalidSqIdx.flag := Mux(!s2_differentFlag || dataInvalidSqIdx >= s2_deqPtrExt.value, s2_deqPtrExt.flag, s2_enqPtrExt.flag)
io.forward(i).dataInvalidSqIdx.value := dataInvalidSqIdx
} .otherwise {
// may be store inst has been written to sbuffer already.
io.forward(i).dataInvalidSqIdx := RegEnable(io.forward(i).uop.sqIdx, io.forward(i).valid)
}

This leads to a possible stuck in LoadQueueReplay.

And to resolve the stuck, we incorrectly introduced this Commit(af757d1).

This Commit(af757d1) causes BlockSqIdx to unblock without DataValid.
This leads to certain performance issues.

This revision fixes the inappropriate forward fault triggering method and reverses the Commit(af757d1).

This should bring performance back up again.

Apologies for my mistake.

@Anzooooo Anzooooo force-pushed the wzz-fix-ff-replay branch 3 times, most recently from ea741d6 to 8e51f70 Compare December 13, 2024 08:15
@XiangShanRobot
Copy link

[Generated by IPC robot]
commit: 8e51f70

commit astar copy_and_run coremark gcc gromacs lbm linux mcf microbench milc namd povray wrf xalancbmk
8e51f70 1.946 0.451 2.699 1.221 2.854 2.462 2.395 0.934 1.422 2.057 3.437 2.720 2.382 3.260

master branch:

commit astar copy_and_run coremark gcc gromacs lbm linux mcf microbench milc namd povray wrf xalancbmk
c7ca40e 1.937 0.451 2.697 1.227 2.866 2.462 2.393 0.930 1.425 2.054 3.437 2.718 2.367 3.210
38d0d7c
8ffb12e
99baa88 1.899 0.450 2.701 1.219 2.833 2.461 2.395 0.921 1.426 2.022 3.432 2.707 2.368 3.227
f346d72 1.899 0.450 2.701 1.219 2.833 2.461 2.395 0.921 1.426 2.022 3.432 2.707 2.368 3.227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants