Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix stat printing condition #584

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

fix stat printing condition #584

wants to merge 1 commit into from

Conversation

tommydcjung
Copy link
Contributor

No description provided.

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

There seems to be a little confusion. We made a change to remove the subtraction because the stats profiler was broken with it.

@tommydcjung
Copy link
Contributor Author

Shouldn't the parser be fixed then? it broke bsg_print_stat in bsg_manycore. @drichmond

@drichmond
Copy link
Contributor

IIRC, it's not the parser that's borked. The stat file was empty.

@tommydcjung
Copy link
Contributor Author

If you remove the subtract then the coordinate will not match and it will never print any stats...

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

We found the opposite to be true, and that removing the subtract fixed the problem. It sounds like there's a mismatch in infrastructure between replicant and manyore in terms of how the stat packets are formatted?

@tommydcjung
Copy link
Contributor Author

Doesn't it use the same bsg_nonsynth_manycore_testbench.v and bsg_cuda_print_stat_kernel_start/end() in C?

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

it does

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

I'm combing through the code now.

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

I think the divergence happens here:

(((__bsg_grp_org_y + __bsg_y) & BSG_CUDA_PRINT_STAT_Y_MASK) << BSG_CUDA_PRINT_STAT_Y_SHIFT) | \

How does the nbf loader set the __bsg_grp_org_x/y variables? It looks like CUDA sets them to absolute coordinates currently. So they are being set to (16,8) at the moment.

@tommydcjung
Copy link
Contributor Author

It uses bsg_set_tile_x_y(). __bsg_grp_org_xy actually comes from the tile-group origin Network CSR. https://github.com/bespoke-silicon-group/bsg_manycore/blob/master/software/bsg_manycore_lib/bsg_set_tile_x_y.c#L31
But tgo network CSRs are subcord only, so they are set to (0,0).

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

I see. I can fix that in replicant, but will this code work if a tile in pod (x=0,y=1) issues a print_stat request?

@tommydcjung
Copy link
Contributor Author

tommydcjung commented Sep 27, 2021

Before you do that, I think we need to spend more time thinking about the correct fix here. I don't think this fix will work when there are multiple pods. Removing the subtract does not completely fix the problem either, because the coordinates in print_stat_tag are only 6-bits wide (currently global x/y are 7-bits). But that's fine, because we can always know who sent the packet by looking at the src x/y of the packet.

@drichmond
Copy link
Contributor

Could we use a bind solution here, like for the SAIF start/end?

@drichmond
Copy link
Contributor

A bind solution would reduce network traffic, improve accuracy of the windows, and reduce the instruction footprint.

Maybe we could even bind in and read the tile group IDs from DMEM?

Just spitballing here.

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

what does saif do? isn't there a special instruction that triggers it?

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

I think, as Tommy suggested, we should check against the src x/y coordinates of the packet rather than reading the x/y out of the tag data. So that would mean checking for equality with the global_x_i, global_y_i.

@drichmond
Copy link
Contributor

In retrospect that seems sort of obvious. Why didn't we do that anyway? Is this one more legacy part of F1?

@mrutt92
Copy link
Contributor

mrutt92 commented Sep 27, 2021

I really don't know. If I recall, we didn't support the stats_tag feature on F1.

@drichmond
Copy link
Contributor

we didn't, but we supported it in simulation, back when we used the F1 interface.

@drichmond
Copy link
Contributor

@tommydcjung What do you think of a bind solution? Maybe not now, but long term? Are there any issues you foresee?

@tommydcjung
Copy link
Contributor Author

bind solution as in using addi x0 won't be able to trigger vcache and router profiler. For saif, the bind solution doesn't work when you are simulating using gate-level netlist.

@drichmond
Copy link
Contributor

Great points! So we would have to modify all three profilers to bind, but even then it won't work if we want to use gate-level netlist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants