Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

摘要

In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego motion for boosting 3D object detection. CAPE achieves the state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.

简介

CAPE提出了一种相机视角嵌入信息（CAmera view Position Embedding）的方法，通过视角归一化的方式，来降低直接使用3D全局位置嵌入信息来学习图像和3D空间之间的对应关系的难度. 该方法在nuScenes数据集的纯视觉配置上取得了SOTA的表现, 并中稿CVPR2023。

视角归一化示意图：

算法流程图如下所示：

训练配置

目前，我们提供了在开源数据集nuScenes验证集上的三种训练配置与结果，详见CAPE训练配置

模型库

模型	骨干网络	分辨率	NDS	3DmAP	模型下载	配置文件	日志
CAPE	r50	1408x512	40.58	34.72	model	config	-
CAPE-T	r50	704x256	44.22	31.78	model	config	-
CAPE-T	v99	800x320	54.36	44.72	model	config	-

可视化

使用教程

数据准备

请下载Nuscenes测数据集, 下载作者提供的annotion文件。

下载好后的数据集目录结构

nuscenes
   ├── maps
   ├── samples
   ├── sweeps
   ├── v1.0-trainval
   ├── v1.0-test
   ...

将nuscenes数据软链至data/nuscenes，或更改配置文件数据集路径。运行如下命令生成petr模型所需的annotation文件。

python tools/create_petr_nus_infos.py

生成完后的数据集目录

nuscenes
   ├── maps
   ├── samples
   ├── sweeps
   ├── v1.0-trainval
   ├── v1.0-test
   ├── petr_nuscenes_annotation_train.pkl
   ├── petr_nuscenes_annotation_val.pkl

为了方便，我们提供了生成好的annotation文件

文件名称	下载链接
petr_nuscenes_annotation_train.pkl	下载
petr_nuscenes_annotation_val.pkl	下载

训练

todo

评估

运行以下命令，进行评估

python tools/evaluate.py --config configs/cape/capet_vovnet_800x320_24ep_wocbgs_load_dd3d_pretrain.yml --model /path/to/your/capet_vov99_800x320_epoch_24.pdparams

引用

如果您认为该工作对您的研究有帮助，请考虑引用：

@article{Xiong2023CAPE,
  title={CAPE: Camera View Position Embedding for Multi-View 3D Object Detection},
  author={Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai},
  booktitle={Computer Vision and Pattern Recognition},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cape

cape

README.md

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

目录

摘要

简介

训练配置

模型库

可视化

使用教程

数据准备

训练

评估

引用

Files

cape

Directory actions

More options

Directory actions

More options

Latest commit

History

cape

Folders and files

parent directory

README.md

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

目录

摘要

简介

训练配置

模型库

可视化

使用教程

数据准备

训练

评估

引用