Multimodal transformer using cross-channel attention for object detection in remote sensing images

This repo contains the official PyTorch implementation for the ICIP 2024 paper: 'Multimodal transformer using cross-channel attention for object detection in remote sensing images' (paper).

Brief Introduction

Cross-channel attention fuses multi-sensory data (RGB, IR) using cross-attention while taking into account two channels at a time. The fused output of cross-channel attention is then used for object detection.
SWIN backbone is used but enhanced with convolutional layer in non-shifting block which acts as an additional support to the SWIN's shifting mechanism.
The proposed model consists of cross-channel attention, enhanced SWIN-like backbone, and yolo-5 based detection head.

Data Preparation

We train and evaluate our model on VEDAI dataset which includes aerial images of two RGB and IR channels. The VEDAI dataset can be downloaded from (here.)
Please prepare the original VEDAI dataset using the 'data_transform.py' file.

Citation

If you find the idea useful or inspiring, please consider citing:

@article{bahaduri2023multimodal,
  title={Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images},
  author={Bahaduri, Bissmella and Ming, Zuheng and Feng, Fangchen and Mokraou, Anissa},
  journal={arXiv preprint arXiv:2310.13876},
  year={2023}
}

Acknowledgement

Our code is heavily based on previous works, including SuperYOLO and YOLOv5 thanks to their authors open-sourcing their implementation codes!

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
basics		basics
models		models
.gitignore		.gitignore
README.md		README.md
Train.py		Train.py
__init__.py		__init__.py
data_transform.py		data_transform.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal transformer using cross-channel attention for object detection in remote sensing images

Brief Introduction

Data Preparation

Citation

Acknowledgement

About

Releases

Packages

Languages

Bissmella/Small-object-detection-transformers

Folders and files

Latest commit

History

Repository files navigation

Multimodal transformer using cross-channel attention for object detection in remote sensing images

Brief Introduction

Data Preparation

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages