Usage of flash attention #12

shaharbar1 · 2024-01-14T13:48:44Z

Consider wrapping the call to self.attention in InterpretableMultiHeadAttention with
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True):
In order to improve speed and memory efficiency.

The text was updated successfully, but these errors were encountered:

otto-dev · 2024-09-23T08:26:07Z

no pytorch/pytorch#125674

Dvirbeno self-assigned this Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of flash attention #12

Usage of flash attention #12

shaharbar1 commented Jan 14, 2024

otto-dev commented Sep 23, 2024

Usage of flash attention #12

Usage of flash attention #12

Comments

shaharbar1 commented Jan 14, 2024

otto-dev commented Sep 23, 2024