-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SASS driven mode failing for PyTorch traces #306
Comments
Can you share the trace file in some way? I saw this before in some reduce kernels (actually, yours is a reduce kernel as well. Maybe this is the same issue). In the last several lines of traces, there is probably a exit with mask FFFFFFFF which means all therads within the warp is exit. However, there will be some traces after that with a mask 00000001, which means that 1 thread is still active. This is NVBit issue we posted here as well NVlabs/NVBit#122. For now, you can either remove the assert or manually delete the lines after the exit. Thanks |
Sure! The traces were generated in this folder. There are 131 trace files for the program, and I have not been able to go through all of them yet – the ones I did go through did not seem to have an instance of the trace-post-exit with the same mask that you referred to, but I did find two EXIT statements with a mask of 00000000 (lines 146 and 152) in this trace file. I will keep looking though, what you mentioned is probably the case. Thank you! |
It is kernel-7. https://github.com/sinharudraneel/dp-performance-accel-sim/blob/week3-traces/simpletorch3-traces/traces/kernel-7.traceg#L149C1-L149C25 The block size is 8, so only eight threads are active. The mask is at: All threads are exited. But one thread is still active after Currently we are unable to fix this. This is an NVBit problem. Ignoring it for now won't harm too much. |
Ah I see, thank you! |
I have been trying to run a simple pytorch one linear layer neural network which trains on random data on the SASS driven version of Accel Sim as a test for a larger project. I am able to generate the traces properly from this program but when I try to run the traces through the SASS driven mode of Accel-Sim, I get this error. Essentially, it is an assert failure for `active.any() == false' in shader.cc, I do not completely understand the root of the problem. Is this an error because of unsupported operations running on accel-sim, have I written my program wrong (although it runs just fine as it is on the Tesla V100 that I have been using), or is it a bug in the code? Just out of curiosity I commented out the assert statement and ran the traces through the simulator again, it passed the tests. I understand that I probably am messing with an important assertion check which should not be commented out, but could someone let me know what the error might be instead?
The text was updated successfully, but these errors were encountered: