[Test] port flash attention from sglang #3011

Dewei-Wang-sh · 2024-12-16T01:51:07Z

No description provided.

whitneywhtsang · 2024-12-16T19:08:46Z

benchmarks/triton_kernels_benchmark/flash_attention_sglang.py

+    b_seq_len_extend_test = b_seq_len_extend_test.to(device)
+    b_start_loc_extend_test = torch.arange(0, batch_size, dtype=torch.int32) * seq_len
+    b_start_loc_extend_test = b_start_loc_extend_test.to(device)
+    extend_attention_fwd(q_test, k_test, v_test, o_tensor_ptr, k_buffer_test, v_buffer_test, req_to_tokens_test,


can we add timing mechanism and result checking to ensure functional correctness?

I don't know how to compare the result...
it's originated from end2end test

whitneywhtsang · 2024-12-16T19:09:59Z

benchmarks/triton_kernels_benchmark/flash_attention_sglang.py

+    b_seq_len_extend_test = b_seq_len_extend_test.to(device)
+    b_start_loc_extend_test = torch.arange(0, batch_size, dtype=torch.int32) * seq_len
+    b_start_loc_extend_test = b_start_loc_extend_test.to(device)
+    extend_attention_fwd(q_test, k_test, v_test, o_tensor_ptr, k_buffer_test, v_buffer_test, req_to_tokens_test,


What are the differences between this implementation of flash attention and https://github.com/intel/intel-xpu-backend-for-triton/blob/main/benchmarks/triton_kernels_benchmark/flash_attention_fwd_benchmark.py?

seems many condition control, but the main difference is not using block pointer.

Dewei-Wang-sh · 2024-12-17T02:34:58Z

make it draft for now, need further discussion about how to support end2end kernel.

[Test] port flash attention from sglang

1f07283

Dewei-Wang-sh linked an issue Dec 16, 2024 that may be closed by this pull request

[e2e test] port flash attention from sglang #3012

Open

fix format

12ef46c

whitneywhtsang reviewed Dec 16, 2024

View reviewed changes

Dewei-Wang-sh requested review from sommerlukas, alexbaden, etiotto and quintinwang5 December 17, 2024 01:35

Dewei-Wang-sh closed this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] port flash attention from sglang #3011

[Test] port flash attention from sglang #3011

Dewei-Wang-sh commented Dec 16, 2024

whitneywhtsang Dec 16, 2024

Dewei-Wang-sh Dec 17, 2024

whitneywhtsang Dec 16, 2024

Dewei-Wang-sh Dec 17, 2024

Dewei-Wang-sh commented Dec 17, 2024

[Test] port flash attention from sglang #3011

[Test] port flash attention from sglang #3011

Conversation

Dewei-Wang-sh commented Dec 16, 2024

whitneywhtsang Dec 16, 2024

Choose a reason for hiding this comment

Dewei-Wang-sh Dec 17, 2024

Choose a reason for hiding this comment

whitneywhtsang Dec 16, 2024

Choose a reason for hiding this comment

Dewei-Wang-sh Dec 17, 2024

Choose a reason for hiding this comment

Dewei-Wang-sh commented Dec 17, 2024