Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding east_not_west to cid #611

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Adding east_not_west to cid #611

wants to merge 2 commits into from

Conversation

dpetrisko
Copy link
Contributor

This PR adds east_not_west to the wormhole CID. This is very useful for systems which combine the east and west memory streams and would like to differentiate between them. Because we have spare bits in the header, this change should have no PPA impact on the manycore hardware itself

@dpetrisko dpetrisko added the enhancement New feature or request label Dec 8, 2021
@dpetrisko dpetrisko requested a review from tommydcjung December 8, 2021 23:01
@dpetrisko dpetrisko self-assigned this Dec 8, 2021
@tommydcjung
Copy link
Contributor

I don't think the change is necessary here. I would just make a basejump module that combines two concentrated wormhole links, and just use that.

@dpetrisko
Copy link
Contributor Author

It's not very easy or efficient to do that. On the sender side, it involves adding a wormhole flow tracker and defining a new wormhole header to leveraging the reserved bits. Then the receiver side needs to change as well. bsg_cache_dma_to_wormhole sends back to the src cord/cid. We need to track which of the dmas is currently returning to again override the wormhole header, either modifying the module or adding a separating tracking fifo. If these dmas are multiplexed onto a subset of dram channels, this is even more challenging.

Or we could add a bit

@tommydcjung
Copy link
Contributor

I don't think it needs to be that complicated. On one side you have wormhole links with cid width of 1, and on the other side, you have a concentrator with the additional cid bit appended. Either way you are combining the same number of wormhole links so the hw complexity should be similar. Combining wormhole links like this is pretty common pattern so it would be nice to have a reusuable module.

@dpetrisko
Copy link
Contributor Author

On one side you have wormhole links with cid width of 1, and on the other side, you have a concentrator with the additional cid bit appended.

In order to do this, you need to track the header flits. This is additional hardware (decoder+counter) that is unnecessary

@tommydcjung
Copy link
Contributor

I'm not saying there won't be any overhead, and the overhead should be pretty minimal. But this only solves a very specific use case (combining east vs west). But there might be many different use cases where you are combining north vs south vc or combining wormhole links of different pod rows, etc. I think the combiner module would be a more generic solution to this particular problem.

@dpetrisko
Copy link
Contributor Author

It's perfectly general to have a CID wide enough to support all memory links in the system. Then to seperate, inject the correct CID during the separation and recombine when you return to the separation point. This PR does the first part, ensuring all links have a unique CID. The separation particulars depend on the surrounding system.

For the two systems that I am familiar with, BIgBlade and the chip that I am designing, there is no injection logic needed because BigBlade has N links and my chip has 1 link. We shouldn't design hardware to solve problems that don't yet exist in a system.

@taylor-bsg
Copy link
Contributor

taylor-bsg commented Jan 5, 2022

The concentrator ID is admittedly a bit of an imperfection, but it is a lesser evil. Packet encapsulation is extremely expensive, especially when going off-chip where we cannot easily change the number of bits, so we have a large incentive to scavenge unused bits from inside the packet rather than append new bits on the outside. The most convenient concentration ID will depend on what kind of system it is being incorporated into. In the limit of no concentration, infinite unique DRAMs, no concentration ID bits are required. Sometimes we will need to collapse ruche channels, sometimes N/S channels, and sometimes E/W channels. In a more powerful language, we could subclass to override the concentration ID, but SystemVerilog does not allow for this. We could also define a "mixin module" where the user has to define a module that compute the concentration ID, but that is non-intuitive. Something more Verilogy would be for us to specify a concentration mask , which says which data we want to aggregate from the tile positional info into the concentrate field. And somewhat entertainingly, the module that does this aggregation is called: https://github.com/bespoke-silicon-group/basejump_stl/blob/master/bsg_misc/bsg_concentrate_static.v

@dpetrisko
Copy link
Contributor Author

@taylor-bsg What's the resolution on this? We have hardware that leverages the E-W in CID, so there's definitely a use-case for it. Masking the CID at usage in the TB/gateway chip seems like an acceptable change for support both variants of changes moving forward. However, I'm fine with making the more complex bsg_concentrate_static change as long as we can all agree on the semantics. One downside to it will be that we'll need to thread through another toplevel parameter to select the concentrator type (curse you, verilog)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants