This is the official repo for our work Temporally Consistent Online Depth Estimation in Dynamic Scenes accepted at WACV 2023.
If you find CODD relevant, please cite
@inproceedings{li2023temporally,
title={Temporally consistent online depth estimation in dynamic scenes},
author={Li, Zhaoshuo and Ye, Wei and Wang, Dilin and Creighton, Francis X and Taylor, Russell H and Venkatesh, Ganesh and Unberath, Mathias},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={3018--3027},
year={2023}
}
CODD is based on several excellent open-sourced libraries
Example setup commands (tested on Ubuntu 20.04 and 22.04)
conda create --name codd python=3.8 -y
conda activate codd
pip install scipy pyyaml terminaltables natsort
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 # pytorch
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1121/download.html # pytorch3d
pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html # mmcv
pip install mmsegmentation # mmseg
pip install git+https://github.com/princeton-vl/lietorch.git # lietorch -- this will take a while
For more details and examples, please see configs
folder.
In CODD, you can configure your model in a modular manner. The network is often specified in the following way:
model = dict(
type='ConsistentOnlineDynamicDepth',
stereo=dict(
type='HITNetMF', # enter your choice of stereo network
... # model specific configs
),
motion=dict(
type="Motion", # enter your choice of motion network
... # model specific configs
),
fusion=dict(
type="Fusion", # enter your choice of fusion network
... # model specific configs
)
)
If only stereo
network is needed, you can simply comment out the motion
and fusion
network.
You can also swap out the individual networks with your own implementation.
In each dataset config, there are several things to be specified.
data_root
: path to stereo data.- For FlyingThings3D dataset, the data are downloaded individually. So
data_root
is the path to RGB images. Additionally specifydisp_root
, path to disparity data;flow_root
, path to optical flow data; anddisp_change_root
, path to disparity change data.
- For FlyingThings3D dataset, the data are downloaded individually. So
train_split
,val_split
,test_split
: path to split files. Please see sectionOthers - Split Files
below for more details.
The rest of the variables are already set but feel free to adjust if you want to customize.
batch_size
: training batch size.crop_size
: training crop sizenum_frames
: the number of frames to run. For training, CODD uses 2 frames. For inference, CODD runs on the entire sequencenum_frames=-1
calib
: focal length * baselinedisp_range
: range of disparityintrinsics
: fx, fy, cx, cy
The training config is of the following format
_base_ = [
'PATH_TO_MODEL_CONFIG', 'PATH_TO_DATA_CONFIG',
'default_runtime.py', 'PATH_TO_SCHEDULE_CONFIG'
]
Modify configs/train_config.py
for desirable model and dataset config
The inference config is of the following format
_base_ = [
'PATH_TO_MODEL_CONFIG', 'PATH_TO_DATA_CONFIG',
'default_runtime.py'
]
Modify configs/inference.py
for desirable model and dataset config
- CODD uses a three stage training strategy on FlyingThings3D
- Training stereo
- Training motion
- Training fusion
- The pretrained model is then fine-tuned on other datasets.
- Modify
configs/train_config.py
for desirable model and dataset config - Run following command
- Distributed
./scripts/train.sh configs/train_config.py NUM_GPUS --work-dir PATH_TO_LOG
- Single GPU
python train.py configs/train_config.py NUM_GPUS --work-dir PATH_TO_LOG
- Distributed
There are two inference modes
- Evaluate
--eval
: compute metrics and save results - Show
--show
: save disparity estimates- when running with
custom_data
, provide path to left and right images using--img-dir
and--r-img-dir
- when running with
To run inference
- Modify
configs/inference_config.py
for model and dataset config - Run following command
- Distributed
./scripts/inference.sh configs/inference_config.py CHECKPOINT_PATH NUM_GPUS [optional arguments]
- Single GPU
python inference.py configs/inference_config.py CHECKPOINT_PATH NUM_GPUS [optional arguments]
- Distributed
Optional arguments:
--work-dir
: logging directory--num-frames
: number of frames to inference on,-1
for all frames
The split file is stored in the following format
LEFT_IMAGE RIGHT_IMAGE DISPARITY_IMAGE OPTICAL_FLOW DISPARITY_CHANGE OPTICAL_FLOW_OCCLUSION DISPARITY_FRAME2_in_FRAME1 DISPARITY_OCCLUSION
The split files can be generated by using utils/generate_split_files.py
.
- For datasets (TartanAir and Sintel) without ground truth disparity change, I use
OPTICAL_FLOW
to warp the next frame disparity into current frame and compute the change myself. However, not all regions are valid due to flow occlusion. Therefore, for such computation,OPTICAL_FLOW_OCCLUSION
must be provided. - For datasets (KITTI Depth) without ground truth optical flow, I used RAFT to estimate the optical flow information.
The disparity of the next frame is stored as
DISPARITY_FRAME2_in_FRAME1
following KITTI convention. - To generate disparity from the ground truth lidar point cloud, please refer to pykitti.
- When a specific type of data is not provided,
None
is used to skip reading. Please seedatasets/custom_stereo_mf.py
for more details of how data is parsed.
To visualize the 3D point cloud generated from depth map, the script utils/vis_point_cloud.py
can be used.
To benchmark speed, run the following command
python benchmark.py configs/models/codd.py
The majority of CODD is licensed under CC-BY-NC, however portions of the project are available under separate license terms: https://github.com/princeton-vl/RAFT-3D is licensed under the BSD-3-Clause license.