概述:多视觉传感器的3D目标检测算法主要应用于汽车智能驾驶,基于车载多摄像头的输入,完成车辆周围多目标的3D位置的检测,并能适应城市工况下的复杂场景。 且指标性能接近使用激光雷达的检测水平。
指标:NDS > 0.569, mAP > 0.481
- 从零开始学习机器学习,深度学习知识
- 掌握软件工具NumPy, PyTorch,
- 阅读论文学习算法思想,并复现论文算法
课程视频在B站有搬运
- CS229 Machine Learning 机器学习
- Deep Learning Specialization 深度学习
- CS231n Deep Learning for Computer Vision 计算机视觉
- Self-Driving Cars Specialization 自动驾驶汽车
- Deep Residual Learning for Image Recognition ResNet
- Feature Pyramid Networks for Object Detection FPN
- Attention Is All You Need Transformer
- End-to-End Object Detection with Transformers DETR
- DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries DETR3D
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection PETR
- PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images PETRv2
- BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers BEVFormer
探索数据,数据预处理
通过复现基本的算法,掌握算法的原理,并提出自己的改进。
NDS | mAP | |
---|---|---|
DETR3D | 0.479 | 0.412 |
PETR | 0.481 | 0.434 |
BEVFormer | 0.569 | 0.481 |
PETRv2 | 0.582 | 0.490 |
Official Code https://github.com/facebookresearch/detr
Official Code https://github.com/wangyueft/detr3d
Official Code https://github.com/fundamentalvision/BEVFormer
Official Code https://github.com/megvii-research/PETR