In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego motion for boosting 3D object detection. CAPE achieves the state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
CAPE提出了一种相机视角嵌入信息(CAmera view Position Embedding)的方法,通过视角归一化的方式,来降低直接使用3D全局位置嵌入信息来学习图像和3D空间之间的对应关系的难度. 该方法在nuScenes数据集的纯视觉配置上取得了SOTA的表现, 并中稿CVPR2023。
视角归一化示意图:
算法流程图如下所示:
目前,我们提供了在开源数据集nuScenes验证集上的三种训练配置与结果,详见CAPE训练配置
模型 | 骨干网络 | 分辨率 | NDS | 3DmAP | 模型下载 | 配置文件 | 日志 |
---|---|---|---|---|---|---|---|
CAPE | r50 | 1408x512 | 40.58 | 34.72 | model | config | - |
CAPE-T | r50 | 704x256 | 44.22 | 31.78 | model | config | - |
CAPE-T | v99 | 800x320 | 54.36 | 44.72 | model | config | - |
请下载Nuscenes测数据集, 下载作者提供的annotion文件。
下载好后的数据集目录结构
nuscenes
├── maps
├── samples
├── sweeps
├── v1.0-trainval
├── v1.0-test
...
将nuscenes数据软链至data/nuscenes,或更改配置文件数据集路径。 运行如下命令生成petr模型所需的annotation文件。
python tools/create_petr_nus_infos.py
生成完后的数据集目录
nuscenes
├── maps
├── samples
├── sweeps
├── v1.0-trainval
├── v1.0-test
├── petr_nuscenes_annotation_train.pkl
├── petr_nuscenes_annotation_val.pkl
为了方便,我们提供了生成好的annotation文件
文件名称 | 下载链接 |
---|---|
petr_nuscenes_annotation_train.pkl | 下载 |
petr_nuscenes_annotation_val.pkl | 下载 |
todo
运行以下命令,进行评估
python tools/evaluate.py --config configs/cape/capet_vovnet_800x320_24ep_wocbgs_load_dd3d_pretrain.yml --model /path/to/your/capet_vov99_800x320_epoch_24.pdparams
如果您认为该工作对您的研究有帮助,请考虑引用:
@article{Xiong2023CAPE,
title={CAPE: Camera View Position Embedding for Multi-View 3D Object Detection},
author={Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai},
booktitle={Computer Vision and Pattern Recognition},
year={2023}
}