Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code mismatch with the theory #9

Open
basavaraj-hampiholi opened this issue Jul 29, 2021 · 10 comments
Open

code mismatch with the theory #9

basavaraj-hampiholi opened this issue Jul 29, 2021 · 10 comments

Comments

@basavaraj-hampiholi
Copy link

Hi All,

Thanks for providing the code.
I come across the mismatch between the code and the theory you proposed for the transformer block. The paper says "Instead, we propose to replace the original position-wise linear projection for Multi-Head Self-Attention (MHSA)", but lines 198-200 in https://github.com/leoxiaobin/CvT/blob/main/lib/models/cls_cvt.py still projects q,k,v through linear layers. Have you missed an else statement there? why are you projecting q,k,v values twice?

Please correct me if I have misunderstood it.

Thanks,
Basavaraj

@askerlee
Copy link

askerlee commented Aug 6, 2021

This is after conv_proj_q, conv_proj_k and conv_proj_v. But I'm not sure why the authors still use the pointwise projections after the conv projections.

@basavaraj-hampiholi
Copy link
Author

@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.

@diaodeyi
Copy link

I want to know the code how to call the get_cls_model function in the cls_cvt.py

@basavaraj-hampiholi
Copy link
Author

Hi @diaodeyi ,
In the present code, get_cls_model function is called by the registry.py. You can use build_model in build.py to call the model. Otherwise, you can remove the registry and directly call get_cls_model function. Both way should work

Good luck..

@diaodeyi
Copy link

diaodeyi commented Oct 3, 2021

By the way , theself.proj = nn.Linear(dim_out, dim_out)Means FFN only projection with same dimension?

@basavaraj-hampiholi
Copy link
Author

@diaodeyi It's the single linear layer (with the same in/out dimension) right after the attention calculation. The FFN in this code is class MLP (line 53).

@diaodeyi
Copy link

diaodeyi commented Oct 6, 2021

Thanks, there are so many linear projections that aren't be mentioned by paper.

@basavaraj-hampiholi
Copy link
Author

@diaodeyi Yes. I think they have left them out with the presumption that the reader has a prior good understanding of basic transformer architecture.

@diaodeyi
Copy link

diaodeyi commented Mar 9, 2022

@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.

No, I think the proj_q\k\v are exactly the things the paper does not mention.

@Markin-Wang
Copy link

Markin-Wang commented Oct 1, 2022

@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.

No, I think the proj_q\k\v are exactly the things the paper does not mention.

Hi, the seperable depth conv contains two parts: depth-wise conv and point-wise conv. The author implemented the point-wise conv via the linear layer, maybe because it's convenience for the ablation study. The only difference between them is the bias term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants