-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding east_not_west to cid #611
base: master
Are you sure you want to change the base?
Conversation
I don't think the change is necessary here. I would just make a basejump module that combines two concentrated wormhole links, and just use that. |
It's not very easy or efficient to do that. On the sender side, it involves adding a wormhole flow tracker and defining a new wormhole header to leveraging the reserved bits. Then the receiver side needs to change as well. bsg_cache_dma_to_wormhole sends back to the src cord/cid. We need to track which of the dmas is currently returning to again override the wormhole header, either modifying the module or adding a separating tracking fifo. If these dmas are multiplexed onto a subset of dram channels, this is even more challenging. Or we could add a bit |
I don't think it needs to be that complicated. On one side you have wormhole links with cid width of 1, and on the other side, you have a concentrator with the additional cid bit appended. Either way you are combining the same number of wormhole links so the hw complexity should be similar. Combining wormhole links like this is pretty common pattern so it would be nice to have a reusuable module. |
In order to do this, you need to track the header flits. This is additional hardware (decoder+counter) that is unnecessary |
I'm not saying there won't be any overhead, and the overhead should be pretty minimal. But this only solves a very specific use case (combining east vs west). But there might be many different use cases where you are combining north vs south vc or combining wormhole links of different pod rows, etc. I think the combiner module would be a more generic solution to this particular problem. |
It's perfectly general to have a CID wide enough to support all memory links in the system. Then to seperate, inject the correct CID during the separation and recombine when you return to the separation point. This PR does the first part, ensuring all links have a unique CID. The separation particulars depend on the surrounding system. For the two systems that I am familiar with, BIgBlade and the chip that I am designing, there is no injection logic needed because BigBlade has N links and my chip has 1 link. We shouldn't design hardware to solve problems that don't yet exist in a system. |
The concentrator ID is admittedly a bit of an imperfection, but it is a lesser evil. Packet encapsulation is extremely expensive, especially when going off-chip where we cannot easily change the number of bits, so we have a large incentive to scavenge unused bits from inside the packet rather than append new bits on the outside. The most convenient concentration ID will depend on what kind of system it is being incorporated into. In the limit of no concentration, infinite unique DRAMs, no concentration ID bits are required. Sometimes we will need to collapse ruche channels, sometimes N/S channels, and sometimes E/W channels. In a more powerful language, we could subclass to override the concentration ID, but SystemVerilog does not allow for this. We could also define a "mixin module" where the user has to define a module that compute the concentration ID, but that is non-intuitive. Something more Verilogy would be for us to specify a concentration mask , which says which data we want to aggregate from the tile positional info into the concentrate field. And somewhat entertainingly, the module that does this aggregation is called: https://github.com/bespoke-silicon-group/basejump_stl/blob/master/bsg_misc/bsg_concentrate_static.v |
bb5b0b5
to
f7a6aec
Compare
f7a6aec
to
b06a4da
Compare
@taylor-bsg What's the resolution on this? We have hardware that leverages the E-W in CID, so there's definitely a use-case for it. Masking the CID at usage in the TB/gateway chip seems like an acceptable change for support both variants of changes moving forward. However, I'm fine with making the more complex bsg_concentrate_static change as long as we can all agree on the semantics. One downside to it will be that we'll need to thread through another toplevel parameter to select the concentrator type (curse you, verilog) |
This PR adds east_not_west to the wormhole CID. This is very useful for systems which combine the east and west memory streams and would like to differentiate between them. Because we have spare bits in the header, this change should have no PPA impact on the manycore hardware itself