initial release

Laktus · Apr 1, 2021 · 045522e · 045522e
commit 045522e
Show file tree

Hide file tree

Showing 69 changed files with 12,720 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,129 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021 Ze Liu
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,199 @@
+# Group-Free 3D Object Detection via Transformers
+
+By [Ze Liu](https://github.com/zeliu98), [Zheng Zhang](https://github.com/stupidZZ)
+, [Yue Cao](https://github.com/caoyue10), [Han Hu](https://github.com/ancientmooner), [Xin Tong](http://www.xtong.info/)
+
+![teaser](doc/teaser.png)
+
+**Updates**
+
+- April 01, 2021: initial release.
+
+## Introduction
+
+This repo is the official implementation
+of ["Group-Free 3D Object Detection via Transformers"](https://arxiv.org/abs/2104.).
+
+Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object
+representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points
+to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points.
+However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D
+object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D
+point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object
+from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the
+contribution of each point is automatically learned in the network training. With an improved attention stacking scheme,
+our method fuses object features in different stages and generates more accurate object detection results. With few
+bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used
+benchmarks, ScanNet V2 and SUN RGB-D.
+
+In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation
+scripts on ScanNet and SUN RGB-D.
+
+## Citation
+
+```
+@article{liu2021,
+  title={Group-Free 3D Object Detection via Transformers},
+  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
+  journal={arXiv preprint arXiv:2104.},
+  year={2021}
+}
+```
+
+## Main Results
+
+### Scannet V2
+
+|Method | backbone | [email protected] | [email protected] | Model |
+|:---:|:---:|:---:|:---:|:---:|
+|[HGNet](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_A_Hierarchical_Graph_Network_for_3D_Object_Detection_on_Point_CVPR_2020_paper.pdf)| GU-net| 61.3 | 34.4 | - |
+|[GSDN](https://arxiv.org/pdf/2006.12356.pdf)| MinkNet | 62.8 | 34.8 | [waiting for release](https://github.com/jgwak/GSDN) |
+|[3D-MPA](https://arxiv.org/abs/2003.13867)| MinkNet | 64.2 | 49.2 |  [waiting for release](https://github.com/francisengelmann/3D-MPA) |
+|[VoteNet](https://arxiv.org/abs/1904.09664) | PointNet++ | 62.9 | 39.9 | [official repo](https://github.com/facebookresearch/votenet) |
+|[MLCVNet](https://arxiv.org/abs/2004.05679) | PointNet++ | 64.5 | 41.4 | [official repo](https://github.com/NUAAXQ/MLCVNet) |
+|[H3DNet](https://arxiv.org/abs/2006.05682) | PointNet++ | 64.4 | 43.4 | [official repo](https://github.com/zaiweizhang/H3DNet) |
+|[H3DNet](https://arxiv.org/abs/2006.05682) | 4xPointNet++ | 67.2| 48.1 | [official repo](https://github.com/zaiweizhang/H3DNet) |
+| Ours(L6, O256) | PointNet++ | 67.3 (66.2*) | 48.9 (48.4*) |[model](https://drive.google.com/file/d/1aS3vsHtg1QU0yFGPa_-kdBmfGR7VTvY0/view?usp=sharing)|
+| Ours(L12, O256) | PointNet++ | 67.2 (66.6*) | 49.7 (49.3*) |[model](https://drive.google.com/file/d/1IMaSW3GbXSKdDRnO_r60AiJaDEKkqAv8/view?usp=sharing)|
+| Ours(L12, O256) | PointNet++w2× |68.8 (68.3*) | 52.1 (51.1*) |[model](https://drive.google.com/file/d/1V6sFLFcqsp7YJ3-9AV2NqUhEGVkuNGWT/view?usp=sharing)|
+| Ours(L12, O512) | PointNet++w2× | 69.1 (68.8*) |52.8 (52.3*) |[model](https://drive.google.com/file/d/16NAEZqxPdBkxW7GGKGHe4-nDtfqL1htE/view?usp=sharing)|
+
+### SUNRGBD
+
+|Method | backbone | inputs | [email protected] | [email protected] | Model |
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|[VoteNet](https://arxiv.org/abs/1904.09664)| PointNet++ |point | 59.1 | 35.8 |[official repo](https://github.com/facebookresearch/votenet)|
+|[MLCVNet](https://arxiv.org/abs/2004.05679)|PointNet++ | point | 59.8 | - | [official repo](https://github.com/NUAAXQ/MLCVNet) |
+|[HGNet](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_A_Hierarchical_Graph_Network_for_3D_Object_Detection_on_Point_CVPR_2020_paper.pdf)| GU-net |point | 61.6 |-|-|
+|[H3DNet](https://arxiv.org/abs/2006.05682) | 4xPointNet++ |point | 60.1 | 39.0 | [official repo](https://github.com/zaiweizhang/H3DNet) |
+|[imVoteNet](https://arxiv.org/abs/2001.10692)|PointNet++|point+RGB| 63.4 | - |  [official repo](https://github.com/facebookresearch/imvotenet)|
+| Ours(L6, O256)| PointNet++ | point | 62.8 (62.6*) | 42.3 (42.0*) |[model](https://drive.google.com/file/d/1uVQS3jtPQ6osZXPpydEcsoTt51TPqhMs/view?usp=sharing) |
+
+**Notes:**
+
+- `*` means the result is averaged over 5-times evaluation since the algorithm randomness is large.
+
+## Install
+
+### Requirements
+
+- `Ubuntu 16.04`
+- `Anaconda` with `python=3.6`
+- `pytorch>=1.3`
+- `torchvision` with  `pillow<7`
+- `cuda=10.1`
+- `trimesh>=2.35.39,<2.35.40`
+- `'networkx>=2.2,<2.3'`
+- compile the CUDA layers for [PointNet++](http://arxiv.org/abs/1706.02413), which we used in the backbone
+  network: `sh init.sh`
+- others: `pip install termcolor opencv-python tensorboard`
+
+### Data preparation
+
+For SUN RGB-D, follow the [README](./sunrgbd/README.md) under the `sunrgbd` folder.
+
+For ScanNet, follow the [README](./scannet/README.md) under the `scannet` folder.
+
+## Usage
+
+### Scannet
+
+For `L6, O256` training:
+
+```bash
+python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
+    train_dist.py --num_point 50000 --num_decoder_layers 6 \
+    --size_delta 0.111111111111 --center_delta 0.04 \
+    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
+    --dataset scannet --data_root <data directory> [--log_dir <log directory>]
+```
+
+For `L6, O256` evaluation:
+
+```bash
+python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
+    --checkpoint_path <checkpoint> --avg_times 5 \
+    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
+```
+
+For `L12, O256` training:
+
+```bash
+python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
+    train_dist.py --num_point 50000 --num_decoder_layers 12 \
+    --size_delta 0.111111111111 --center_delta 0.04 \
+    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
+    --dataset scannet --data_root <data directory> [--log_dir <log directory>]
+```
+
+For `L6, O256` evaluation:
+
+```bash
+python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
+    --checkpoint_path <checkpoint> --avg_times 5 \
+    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
+```
+
+For `w2x, L12, O256` training:
+
+```bash
+python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
+    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
+    --size_delta 0.111111111111 --center_delta 0.04 \
+    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
+    --dataset scannet --data_root <data directory> [--log_dir <log directory>]
+```
+
+For `w2x, L12, O256` evaluation:
+
+```bash
+python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
+    --checkpoint_path <checkpoint> --avg_times 5 \
+    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
+```
+
+For `w2x, L12, O512` training:
+
+```bash
+python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
+    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
+    --size_delta 0.111111111111 --center_delta 0.04 \
+    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
+    --dataset scannet --data_root <data directory> [--log_dir <log directory>]
+```
+
+For `w2x, L12, O512` evaluation:
+
+```bash
+python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
+    --checkpoint_path <checkpoint> --avg_times 5 \
+    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]
+```
+
+#### SUNRGBD
+
+For `L6, O256` training:
+
+```bash
+python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
+    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
+    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
+    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
+    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]
+```
+
+For `L6, O256` evaluation:
+
+```bash
+python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
+    --checkpoint_path <checkpoint> --avg_times 5 \
+    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]
+```
+
+## Acknowledgements
+
+We thank a lot for the flexible codebase of [votenet](https://github.com/facebookresearch/votenet).
+
+## License
+
+The code is released under MIT License (see LICENSE file for details).
diff --git a/doc/teaser.png b/doc/teaser.png