Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many repeats in kineto_traces for pt2 compiled ops #86

Open
FindHao opened this issue Dec 2, 2024 · 5 comments
Open

Too many repeats in kineto_traces for pt2 compiled ops #86

FindHao opened this issue Dec 2, 2024 · 5 comments

Comments

@FindHao
Copy link
Member

FindHao commented Dec 2, 2024

pt2 compiled version:
image
liger version:
image
There are many extra repeats in kineto_trace for pt2 compiled version.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Dec 12, 2024

Are they from the warmup iterations? https://github.com/pytorch-labs/tritonbench/blob/main/docs/kineto_trace.md

@FindHao
Copy link
Member Author

FindHao commented Dec 12, 2024

Are they from the warmup iterations? https://github.com/pytorch-labs/tritonbench/blob/main/docs/kineto_trace.md

yes. So the warmup section in kineto traces are expected? but liger's kineto trace doesn't have such warmup section.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Dec 12, 2024

My mental model is that liger's kineto trace should also have warmup section. It is something weird if not.

Can I have the reproduction command line?

@FindHao
Copy link
Member Author

FindHao commented Dec 12, 2024

My mental model is that liger's kineto trace should also have warmup section. It is something weird if not.

Can I have the reproduction command line?

python run.py --op kl_div --mode fwd_bwd  --precision fp32 --metrics kineto_trace --csv  --num-inputs 1

@xuzhao9
Copy link
Contributor

xuzhao9 commented Dec 13, 2024

@FindHao This is the liger kernel trace with warmup iteration profiled:

image

It shows that when the profile iteration starts, the warm up iterations have finished. Therefore in the kineto trace, it only shows "single iteration", because all previous warmup iterations have finished.

For PT2 and Torch, when the last iteration on CPU launches the kernel, the GPU kernels from previous warm up iterations are still running on GPU, so it looks like "too many repeats for pt2 compiled ops".

Therefore, to me it is expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants