-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare microkernel benchmark IR against main for PRs #2974
Comments
The motivation of this issue is good, we want to have a way to quickly identify if performance impact is variance or not.
What's your expected behavior when there are more than 1 TTIR/TTGIR? In general, I worry that this approach may be too fragile, and end up reporting differences in most cases, and then developers start to ignore the report. MLIR changes all the time, if we use Triton nightly wheels output as baseline, there can be many PRs merged between that and the PR, or the PR could be based on a commit before the Triton nightly wheels. |
Eventually I would like to be able to diff as much IR as possible, but to start I think grabbing one representative set of IRs from each microbenchmark would be ok. I am not sure how to best grab the IR without running the benchmark, figuring that out is part of this issue.
I expect the reports to be somewhat noisy, but the noise should be fairly easy to scan through if we can highlight the diff appropriately. I think knowing how the IR is changing as we merge from upstream / add features to main is very useful. Do you have another approach that would be less noisy that we could try?
Good point, I suggested Triton nightly wheels because they are already built. But, we might want something more recent or more relevant to a given PR. Maybe we can try the Triton nightly wheels to start and consider alternatives (like keeping the builds from every commit in main somewhere or building the last commit from main for a given PR). We could also make this an ad-hoc job and let the user supply both the PR and the commit to compare, but I'd like it to run automatically so we have a record of IR changes to refer to. |
Just to clarify, do you mean diffing the final MLIR of one input shape?
If we reconsider the motivation, which is to have a way to quickly identify if performance impact is variance or not. Another way to resolve that could be CI automatically determine if the performance results are within standard deviation, and report only gains and degradations. Or CI automatically rerun benchmarks with potential performance impact. |
The motivation is to be able to determine if a given PR is going to change the TTIR/TTGIR (and possibly even the MLIR). Detecting performance regressions, etc is just one benefit from knowing if a change is mutating the IR. Yes, it seems like most microbenchmarks do parameter sweeps across input shapes but I imagine there are other possible parametrizations. |
Offline, I mentioned some scripts from the LLVM infrastructure that can help to abstract over SSA value names.
Even if we don't use the scripts themselves, we can maybe learn something from how they detect and abstract SSA value names. |
Before starting the effort, we should clearly define the motivation of this work, as it can help deciding if we prefers false positive or false negative. For example, if we want to ensure no performance regression, then we cannot accept false negative. When comparing IR, best to leverage existing tools like Not sure if the transformation below can be useful here, but FYI,
|
It would be useful to be able to detect whether or not the generated IR (Triton IR or TTGIR) changes for a given pull request (code change). For example, if the IR changes for a particular microkernel then we should have high suspicion for performance changes for that microkernel. Performance measurements are often noisy, whereas IR changes are essentially binary. This would also be useful for when we sync openai commits or make changes to the pass pipeline and want to know all the places that could be affected.
The proposal here is to implement a CI job that runs on every PR and compiles the Triton IR for the microkernel benchmark to TTIR/TTGIR using
triton.compile
. To keep things simple and keep runtime down, we will use the nightly wheels for our "golden reference". A suggested flow is:Note that many PRs will be expected to change the IR, so we do not want the job to fail if the IR differs (otherwise we will have a lot of red PRs that are actually fine, and merging might be blocked). But, it would be nice to have some way of notifying the user when the IR differs instead of having to remember to check the output of a given github actions run.
The text was updated successfully, but these errors were encountered: