By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.
This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".
Updates
- April 01, 2021: initial release.
Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.
In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.
@article{liu2021,
title={Group-Free 3D Object Detection via Transformers},
author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
journal={arXiv preprint arXiv:2104.00678},
year={2021}
}
Method | backbone | [email protected] | [email protected] | Model |
---|---|---|---|---|
HGNet | GU-net | 61.3 | 34.4 | - |
GSDN | MinkNet | 62.8 | 34.8 | waiting for release |
3D-MPA | MinkNet | 64.2 | 49.2 | waiting for release |
VoteNet | PointNet++ | 62.9 | 39.9 | official repo |
MLCVNet | PointNet++ | 64.5 | 41.4 | official repo |
H3DNet | PointNet++ | 64.4 | 43.4 | official repo |
H3DNet | 4xPointNet++ | 67.2 | 48.1 | official repo |
Ours(L6, O256) | PointNet++ | 67.3 (66.2*) | 48.9 (48.4*) | model |
Ours(L12, O256) | PointNet++ | 67.2 (66.6*) | 49.7 (49.3*) | model |
Ours(L12, O256) | PointNet++w2× | 68.8 (68.3*) | 52.1 (51.1*) | model |
Ours(L12, O512) | PointNet++w2× | 69.1 (68.8*) | 52.8 (52.3*) | model |
Method | backbone | inputs | [email protected] | [email protected] | Model |
---|---|---|---|---|---|
VoteNet | PointNet++ | point | 59.1 | 35.8 | official repo |
MLCVNet | PointNet++ | point | 59.8 | - | official repo |
HGNet | GU-net | point | 61.6 | - | - |
H3DNet | 4xPointNet++ | point | 60.1 | 39.0 | official repo |
imVoteNet | PointNet++ | point+RGB | 63.4 | - | official repo |
Ours(L6, O256) | PointNet++ | point | 63.0 (62.6*) | 45.2 (44.4*) | model |
Notes:
*
means the result is averaged over 5-times evaluation since the algorithm randomness is large.- We use 4 GPUs for training by default.
Ubuntu 16.04
Anaconda
withpython=3.6
pytorch>=1.3
torchvision
withpillow<7
cuda=10.1
trimesh>=2.35.39,<2.35.40
'networkx>=2.2,<2.3'
- compile the CUDA layers for PointNet++, which we used in the backbone
network:
sh init.sh
- others:
pip install termcolor opencv-python tensorboard
For SUN RGB-D, follow the README under the sunrgbd
folder.
For ScanNet, follow the README under the scannet
folder.
For L6, O256
training:
python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
train_dist.py --num_point 50000 --num_decoder_layers 6 \
--size_delta 0.111111111111 --center_delta 0.04 \
--learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
--dataset scannet --data_root <data directory> [--log_dir <log directory>]
For L6, O256
evaluation:
python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
--checkpoint_path <checkpoint> --avg_times 5 \
--dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
For L12, O256
training:
python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
train_dist.py --num_point 50000 --num_decoder_layers 12 \
--size_delta 0.111111111111 --center_delta 0.04 \
--learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
--dataset scannet --data_root <data directory> [--log_dir <log directory>]
For L6, O256
evaluation:
python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
--checkpoint_path <checkpoint> --avg_times 5 \
--dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
For w2x, L12, O256
training:
python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
--size_delta 0.111111111111 --center_delta 0.04 \
--learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
--dataset scannet --data_root <data directory> [--log_dir <log directory>]
For w2x, L12, O256
evaluation:
python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
--checkpoint_path <checkpoint> --avg_times 5 \
--dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
For w2x, L12, O512
training:
python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
--size_delta 0.111111111111 --center_delta 0.04 \
--learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
--dataset scannet --data_root <data directory> [--log_dir <log directory>]
For w2x, L12, O512
evaluation:
python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
--checkpoint_path <checkpoint> --avg_times 5 \
--dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
For L6, O256
training:
python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
--size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
--learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
--dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]
For L6, O256
evaluation:
python eval_avg.py --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic \
--checkpoint_path <checkpoint> --avg_times 5 \
--dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]
We thank a lot for the flexible codebase of votenet.
The code is released under MIT License (see LICENSE file for details).