Reduce memory use through better attention? #73

johnml1135 · 2023-11-27T18:34:36Z

https://pytorch.org/blog/out-of-the-box-acceleration/

Are we utilizing the "fused kernels from FlashAttention and Memory-efficient attention"? Can we? We may be able to have significant speedups or to save significant memory that way.

mshannon-sil · 2023-12-05T21:54:49Z

This seems like it would be pretty useful for our purposes since we're training large models. The pytorch blog says they're seeing GPU memory savings of 20%-110% during training, speedups of 10%-70% during training, and speedups of 5%-20% during inferencing. I'd like to test this in SILNLP, but it's still using torch 1.10 rather than torch 2.0, which is required for the accelerated attention mechanism in BetterTransformer. I think it might be best for me to make a new branch in SILNLP with torch 2.0 and test using BetterTransformer there. If it does provide a significant speed/memory improvement for our models, I'd imagine we'd want to upgrade the master branch of SILNLP with torch 2.0 and BetterTransformer too.

mshannon-sil · 2024-01-03T20:20:34Z

The upgrades from BetterTransformer will be incorporated into future versions of transformers and should eventually cover the M2M100 that we use.

johnml1135 · 2024-02-08T16:31:03Z

This should be auto-done when it is put into Transformers natively.

johnml1135 · 2024-12-04T13:03:54Z

@mshannon-sil - this is resolved, if I am correct? Are we using flash attention (or did some other thing get in the way of using it)?

ddaspit · 2024-12-05T20:47:30Z

We haven't updated to the version of HF transformers that supports SDPA yet.

johnml1135 assigned ddaspit Nov 27, 2023

johnml1135 added this to Serval Dec 2, 2023

github-project-automation bot moved this to 🆕 New in Serval Dec 2, 2023

johnml1135 unassigned ddaspit Dec 2, 2023

johnml1135 added this to the Serval API 1.2 milestone Dec 2, 2023

johnml1135 moved this from 🆕 New to 🔖 Ready in Serval Dec 2, 2023

johnml1135 assigned mshannon-sil Dec 2, 2023

mshannon-sil mentioned this issue Dec 7, 2023

Upgrade to latest torch version and use BetterTransformer sillsdev/silnlp#264

Closed

johnml1135 removed this from the Serval API 1.2 milestone Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory use through better attention? #73

Reduce memory use through better attention? #73

johnml1135 commented Nov 27, 2023

mshannon-sil commented Dec 5, 2023

mshannon-sil commented Jan 3, 2024

johnml1135 commented Feb 8, 2024

johnml1135 commented Dec 4, 2024

ddaspit commented Dec 5, 2024

Reduce memory use through better attention? #73

Reduce memory use through better attention? #73

Comments

johnml1135 commented Nov 27, 2023

mshannon-sil commented Dec 5, 2023

mshannon-sil commented Jan 3, 2024

johnml1135 commented Feb 8, 2024

johnml1135 commented Dec 4, 2024

ddaspit commented Dec 5, 2024