You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,I know TransformerEncoderLayer(C,4,C,0.5) C 4 C is the d_model n_head
and dim_feedforward meaning.
and x.unsqueeze(1) becomes N 1 C shape。
Because batch_first is false for transformer,so it will do self attention at batch dim,
but i am confused with what you said in the paper of cross attention. I cant read the cross attention in the pseudo code,can you give me a interpretation about it.By the way what if x shape is batch seq hidden_size?Because for NER task,its shape is that。 In this situation how to apply batchformer?hope for your sincere reply!
The text was updated successfully, but these errors were encountered:
yandun72
changed the title
can you explain the shape of the common batchformer meaning
can you explain cross attention
Oct 20, 2022
Hi @yandun72. If the shape of x is (batch, seq, hidden_size), you can permute the shape as (seq, batch, hidden_size) or set batch_first=true.
Sorry for the description of cross-attention that is confusing you. In batchformer, we incorporate attention across the batch dimension. Therefore, the cross-attention is not specific attention, but transformer attention. We just want to emphasize the batch dimension. You can regard it as cross-batch attention.
Hi @yandun72. If the shape of x is (batch, seq, hidden_size), you can permute the shape as (seq, batch, hidden_size) or set batch_first=true.
Sorry for the description of cross-attention that is confusing you. In batchformer, we incorporate attention across the batch dimension. Therefore, the cross-attention is not specific attention, but transformer attention. We just want to emphasize the batch dimension. You can regard it as cross-batch attention.
Hi,I know TransformerEncoderLayer(C,4,C,0.5) C 4 C is the d_model n_head
and dim_feedforward meaning.
and x.unsqueeze(1) becomes N 1 C shape。
Because batch_first is false for transformer,so it will do self attention at batch dim,
but i am confused with what you said in the paper of cross attention. I cant read the cross attention in the pseudo code,can you give me a interpretation about it.By the way what if x shape is batch seq hidden_size?Because for NER task,its shape is that。 In this situation how to apply batchformer?hope for your sincere reply!
The text was updated successfully, but these errors were encountered: