Skip to content

[NeurIPS‘24] Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention

License

Notifications You must be signed in to change notification settings

haomengz/D-LISA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention (D-LISA)

Project Page | Paper

This is the official implementation for our paper Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention, accepted in NeurIPS 2024.

Overview

Model Architecture

Environment

conda create -n dlisa python=3.10
conda activate dlisa

# install PyTorch 2.0.1 w/ CUDA 11.7
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

# install PyTorch3D with dependencies
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

# install MinkowskiEngine with dependencies
# you may want to downgrade pip to install MinkowskiEngine
conda install -c anaconda openblas
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps \
--install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

# install Python libraries
pip install .

# install CUDA extensions
cd dlisa/common_ops
pip install .

Data Preparation

Please follow the Data Preparation section here.

Training, Inference and Evaluation

# log in to WandB
wandb login

# train a model with the pre-trained detector, using predicted object proposals
python train.py data={scanrefer/nr3d/multi3drefer} experiment_name={any_string} +detector_path=checkpoints/PointGroup_ScanNet.ckpt

# test a model from a checkpoint and save its predictions
python test.py data={scanrefer/nr3d/multi3drefer} experiment_name={checkpoint_experiment_name} data.inference.split={train/val/test} ckpt_path={ckpt_file_path}

# evaluate predictions 
python evaluate.py data={scanrefer/nr3d/multi3drefer} experiment_name={checkpoint_experiment_name} data.evaluation.split={train/val/test}

Training scripts

This project is built on the platform with CentOS 7 and Slurm as the batch scheduler for resource and job management. We provide the training scripts in scripts/ directory for training in multiple jobs:

bash scripts/train.sh

See scheduled_job and resume parameters in config/global_config.yaml and train.py for multi-job training. Please modify the scripts accordingly for your own envirnment and resources.

Checkpoint

Our pre-trained model can be downloaded from Google Drive. Run following commands for testing:

python test.py data=multi3drefer data.inference.split=val ckpt_path=path/to/checkpoint/best.ckpt
python evaluate.py data=multi3drefer

CheckPoint performance:

Split ZT w/ D ZT w/o D ST w/ D ST w/o D MT Overall
Val 41.3 81.8 42.4 67.3 51.0 51.1

Reference

If you find our work or any of our materials useful, please cite our paper:

@misc{zhang2024dlisa,
      title={Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention}, 
      author={Haomeng Zhang and Chiao-An Yang and Raymond A. Yeh},
      year={2024},
      eprint={2410.22306},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We thank M3DRef-CLIP for open sourcing their implementation, which we referred to during the development of this codebase.

About

[NeurIPS‘24] Multi-Object 3D Grounding with Dynamic Modules and Language Informed Spatial Attention

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published