[torch.compile] Dynamic fp8 + rms_norm fusion #31

ProExpertProg · 2024-11-08T17:13:27Z

This PR cleans up the fusion pass to make it easier to add other multi-output patterns. Then it adds dynamic fp8 rmsnorm fusion.

github-actions · 2024-11-08T17:13:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: luka <[email protected]>

…ops to constants Signed-off-by: luka <[email protected]>

Signed-off-by: luka <[email protected]>

varun-sundar-rabindranath · 2024-11-22T00:12:20Z

csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu

+                                     has_residual>(
+        out, input, weight, rms, 1.0f / token_scale, hidden_size, residual);
+  } else {
+    // FP8 - Do not invert s_token_scale for exact match with FBGemm


s_token_scale -> token_scale

varun-sundar-rabindranath · 2024-11-22T00:17:10Z

csrc/quantization/fused_kernels/layernorm_utils.cuh

+    ss += x * x;
+  }
+
+  using BlockReduce = cub::BlockReduce<float, 1024>;


the block_dim.x is defined as

dim3 block(std::min(hidden_size, 1024));

is it safe doing cub::BlockReduce<float,1024> when block_dim.x is < 1024 ?

varun-sundar-rabindranath · 2024-11-22T00:28:55Z

tests/kernels/test_fused_quant_layernorm.py

+QUANT_DTYPES = [torch.int8, torch.float8_e4m3fn]
+NUM_TOKENS = [1, 7, 83, 2048, 4096]  # Arbitrary values for testing
+HIDDEN_SIZES = [1, 2, 3, 4, 16, 64, 67, 768, 2048, 5120, 5137, 8192,
+                8193]  # Arbitrary values for testing


We can probably reduce the hidden_sizes to [1, 3, 4, 16, 64, 2048, 5120, 5137] + the vectorization edge-cases to save test times.

varun-sundar-rabindranath · 2024-11-22T00:30:36Z

vllm/_custom_ops.py

@@ -22,6 +22,7 @@
 supports_moe_ops = False
 with contextlib.suppress(ImportError):
    import vllm._moe_C  # noqa: F401
+


nit: whitespace changes, here and below

varun-sundar-rabindranath · 2024-11-22T00:32:22Z

Reviewed the kernel files and kernel tests. Left some minor comments. LGTM otherwise.

ProExpertProg force-pushed the luka/rms-norm-fusion-refactor branch 2 times, most recently from 81ad334 to 7d1adbf Compare November 12, 2024 23:08

ProExpertProg and others added 9 commits November 21, 2024 20:33

Refactor fusion patterns into class

b5a1bb4

Signed-off-by: luka <[email protected]>

Improved comments, move utils to MultiOutputMatch

555083c

Signed-off-by: luka <[email protected]>

Allow multiple epsilons by clearing pattern matcher cache

86ea3fe

Signed-off-by: luka <[email protected]>

Add graph as property of match, add comments, add utilities, extract …

a39f51c

…ops to constants Signed-off-by: luka <[email protected]>

dynamic quant (fused ops in python)

12f089d

Signed-off-by: luka <[email protected]>

add fused-rms-quant-dyn-per-token branch

7315df8

Signed-off-by: luka <[email protected]>

Upgrade reduction

466fd53

Signed-off-by: luka <[email protected]>

Use new dynamic ops for fusion, tolerance has to be higher.

5f9a3be

Signed-off-by: luka <[email protected]>

In progress dynamic fusion debugging

651ebdc

ProExpertProg force-pushed the luka/rms-norm-fusion-refactor branch from 7d1adbf to 651ebdc Compare November 21, 2024 23:35

ProExpertProg changed the title ~~Fusion and functionalization pass refactors~~ [torch.compile] Dynamic fp8 + rms_norm fusion Nov 21, 2024

ProExpertProg changed the base branch from luka/rms-norm-fusion to main November 21, 2024 23:48

varun-sundar-rabindranath reviewed Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] Dynamic fp8 + rms_norm fusion #31

[torch.compile] Dynamic fp8 + rms_norm fusion #31

ProExpertProg commented Nov 8, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 8, 2024

varun-sundar-rabindranath Nov 22, 2024

varun-sundar-rabindranath Nov 22, 2024

varun-sundar-rabindranath Nov 22, 2024

varun-sundar-rabindranath Nov 22, 2024

varun-sundar-rabindranath commented Nov 22, 2024

[torch.compile] Dynamic fp8 + rms_norm fusion #31

Are you sure you want to change the base?

[torch.compile] Dynamic fp8 + rms_norm fusion #31

Conversation

ProExpertProg commented Nov 8, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 8, 2024

varun-sundar-rabindranath Nov 22, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Nov 22, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Nov 22, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Nov 22, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath commented Nov 22, 2024

ProExpertProg commented Nov 8, 2024 •

edited by github-actions bot

Loading