Question regarding the configuration of decoder_retention_heads #84

Kratos-Wen · 2023-11-30T02:53:58Z

Thank you for your great work!

I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you experimented with the performance differences under different numbers of heads? Is this configuration sufficient in terms of attention performance? Since your model is primarily used for sequence models in language processing, I am looking to extend its application to image processing. I'm unsure if I should make any modifications to this aspect.

Thank you in advance for your response.

jpokemon232 · 2023-12-25T23:04:24Z

when I was adjusting the configurations of Retnet I also ran into this issue. Can you make a assert that the decoder_embed_dim and decoder_value_embed_dim must be a multiple of decoder_retention_heads.

sunyt32 · 2023-12-28T13:22:03Z

@Kratos-Wen decoder_retention_heads affects key_diim, which is recommanded to set as 256.

shumingma assigned donglixp Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the configuration of decoder_retention_heads #84

Question regarding the configuration of decoder_retention_heads #84

Kratos-Wen commented Nov 30, 2023

jpokemon232 commented Dec 25, 2023

sunyt32 commented Dec 28, 2023

Question regarding the configuration of decoder_retention_heads #84

Question regarding the configuration of decoder_retention_heads #84

Comments

Kratos-Wen commented Nov 30, 2023

jpokemon232 commented Dec 25, 2023

sunyt32 commented Dec 28, 2023