-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple view overlaying problem. #12
Comments
Yes, queries in overlap area only keep the information from the latter view in the current implementation, because it is simple and the number of queries in overlap area is negligible. Using information from multiple view will be helpful by intuition, but we haven't tried.
Yes, replacing vanilla transformer decoder layer with our predictive interaction layer for both modalities can bring better performance as shown in Table 3(a) of our paper. |
Thank you for your detailed response. |
I'm sorry to reopen this issue, but there are still some doubts: I tried using DynamicConv compared to the vanilla transformer, and it is indeed more effective, but I still don't understand why it is effective. Is it because the task being processed is RoI Feature, and I guess vanilla transformer may have difficulty describing its position information? If it is convenient, can you explain the reason why DynamicConv can bring better performance? Thank you. |
Thank you for your great work.
https://github.com/fudan-zvg/DeepInteraction/blob/main/projects/mmdet3d_plugin/models/utils/decoder_utils.py#L758
2.Is it because of its superior performance that DynamicConv is used to replace the cross-Attention module in the decoder?
The text was updated successfully, but these errors were encountered: