Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input Access Count Mismatch in Timeloop ISPASS 2020 Tutorial #295

Closed
Jaesuk-Lee opened this issue Dec 19, 2024 · 4 comments · May be fixed by #297
Closed

Input Access Count Mismatch in Timeloop ISPASS 2020 Tutorial #295

Jaesuk-Lee opened this issue Dec 19, 2024 · 4 comments · May be fixed by #297

Comments

@Jaesuk-Lee
Copy link

Jaesuk-Lee commented Dec 19, 2024

Hello!

While running the ISPASS 2020 tutorial (timeloop-accelergy-exercises/workspace/tutorial_exercises/01_accelergy_timeloop_2020_ispass), I observed that applying the input stationary dataflow to the MainMemory level in the 1D convolution problem produced input access counts that differ from the theoretical values.

Observed Mapping

Input-Stationary + Local Output-Stationary Mapping:
MainMemory [ Weights:96 (96) Inputs:18 (18) Outputs:512 (512) ] 
---------------------------------------------------------------
| for P in [0:16)
|   for K in [0:32) // Input-stationary

Buffer [ Weights:3 (3) Inputs:3 (3) Outputs:1 (1) ] 
---------------------------------------------------
|     for R in [0:3) // Output-stationary
|       << Compute >>

Expected Behavior

I expected the results of this mapping to match those in the second row of the 'MainMemory accesses' table captured from the ISCA tutorial slideset below:
image

Observed Behavior

However, the MainMemory Scalar Reads count turned out to be different. Specifically:

Level 2
-------
=== MainMemory ===
...
    STATS
    -----
...
    Inputs:
...
        Scalar reads (per-instance)   : 48 // expected P+R-1(W) but got P*R

I obtained the expected results when applying a Weight-Stationary mapping at the MainMemory level(as shown in the first row of the table).
I would like to understand why this difference occurs.

Thanks.

@angshuman-parashar
Copy link
Collaborator

Could you please paste the exact YAMLs you used (you can concatenate them into a a single file and upload). Please also indicate which version of Timeloop you used.

@Jaesuk-Lee
Copy link
Author

Jaesuk-Lee commented Dec 21, 2024

I used timeloop-model in Timeloop.v4 with TimeloopFE, as guided by the tutorial.
The input YAML file (parsed by TimeloopFE) is attached below.
The only change from exercise 2 no-tiling is the permutation attribute of MainMemory (PRK -> RKP).

You can observe the MainMemory->STATS->Inputs->Scalar reads = 48 (P $$\times$$ R).
However, I expected this to be P + R - 1 because:

   P    delta_size  num_epochs_
 -----  ----------  -----------
   0         R           1
  1-15       1          P-1

The same behavior is observed when I change the permutation in 'exercise 2 tiled mapping'.

Attachment: Input YAML file parsed by timeloopFE

@angshuman-parashar
Copy link
Collaborator

Thank you for bringing this to our attention, and thank you for the detailed reproduction instructions.

I successfully reproduced the issue. It was a pretty serious and longstanding bug, although fortunately the fix was just a few localized lines of code. I have pushed the fix to the branch translation_fix.

Could you please test it on your end and let me know if it's working?

@Jaesuk-Lee
Copy link
Author

Jaesuk-Lee commented Dec 24, 2024

I have tested all the mappings introduced in the ISCA '20 tutorial, exercise 2. Thanks to your excellent work, no further issues were observed.
ISLOS_tiled.txt
ISLOS_untiled.txt

@angshuman-parashar angshuman-parashar linked a pull request Dec 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants