Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention (D-LISA)

This is the official implementation for our paper Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention, accepted in NeurIPS 2024.

Overview

Environment

conda create -n dlisa python=3.10
conda activate dlisa

# install PyTorch 2.0.1 w/ CUDA 11.7
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

# install PyTorch3D with dependencies
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

# install MinkowskiEngine with dependencies
# you may want to downgrade pip to install MinkowskiEngine
conda install -c anaconda openblas
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps \
--install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

# install Python libraries
pip install .

# install CUDA extensions
cd dlisa/common_ops
pip install .

Data Preparation

Please follow the Data Preparation section here.

Training, Inference and Evaluation

# log in to WandB
wandb login

# train a model with the pre-trained detector, using predicted object proposals
python train.py data={scanrefer/nr3d/multi3drefer} experiment_name={any_string} +detector_path=checkpoints/PointGroup_ScanNet.ckpt

# test a model from a checkpoint and save its predictions
python test.py data={scanrefer/nr3d/multi3drefer} experiment_name={checkpoint_experiment_name} data.inference.split={train/val/test} ckpt_path={ckpt_file_path}

# evaluate predictions 
python evaluate.py data={scanrefer/nr3d/multi3drefer} experiment_name={checkpoint_experiment_name} data.evaluation.split={train/val/test}

Training scripts

This project is built on the platform with CentOS 7 and Slurm as the batch scheduler for resource and job management. We provide the training scripts in scripts/ directory for training in multiple jobs:

bash scripts/train.sh

See scheduled_job and resume parameters in config/global_config.yaml and train.py for multi-job training. Please modify the scripts accordingly for your own envirnment and resources.

Checkpoint

Our pre-trained model can be downloaded from Google Drive. Run following commands for testing:

python test.py data=multi3drefer data.inference.split=val ckpt_path=path/to/checkpoint/best.ckpt
python evaluate.py data=multi3drefer

CheckPoint performance:

Split	ZT w/ D	ZT w/o D	ST w/ D	ST w/o D	MT	Overall
Val	41.3	81.8	42.4	67.3	51.0	51.1

Reference

If you find our work or any of our materials useful, please cite our paper:

@misc{zhang2024dlisa,
      title={Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention}, 
      author={Haomeng Zhang and Chiao-An Yang and Raymond A. Yeh},
      year={2024},
      eprint={2410.22306},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We thank M3DRef-CLIP for open sourcing their implementation, which we referred to during the development of this codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
dlisa		dlisa
docs/img		docs/img
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
setup.py		setup.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention (D-LISA)

Overview

Environment

Data Preparation

Training, Inference and Evaluation

Training scripts

Checkpoint

Reference

Acknowledgement

About

Releases

Packages

Languages

License

haomengz/D-LISA

Folders and files

Latest commit

History

Repository files navigation

Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention (D-LISA)

Overview

Environment

Data Preparation

Training, Inference and Evaluation

Training scripts

Checkpoint

Reference

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages