You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you experimented with the performance differences under different numbers of heads? Is this configuration sufficient in terms of attention performance? Since your model is primarily used for sequence models in language processing, I am looking to extend its application to image processing. I'm unsure if I should make any modifications to this aspect.
Thank you in advance for your response.
The text was updated successfully, but these errors were encountered:
when I was adjusting the configurations of Retnet I also ran into this issue. Can you make a assert that the decoder_embed_dim and decoder_value_embed_dim must be a multiple of decoder_retention_heads.
Thank you for your great work!
I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you experimented with the performance differences under different numbers of heads? Is this configuration sufficient in terms of attention performance? Since your model is primarily used for sequence models in language processing, I am looking to extend its application to image processing. I'm unsure if I should make any modifications to this aspect.
Thank you in advance for your response.
The text was updated successfully, but these errors were encountered: