Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Project Page | Paper | Hugging Face Demo | Interactive Results | Data

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
Arxiv 2024

Installation

Setting up the environment

git clone https://github.com/DepthAnything/PromptDA.git
cd PromptDA
pip install -r requirements.txt
pip install -e .
sudo apt install ffmpeg  # for video generation

Pre-trained Models

Model	Params	Checkpoint
Prompt-Depth-Anything-Large	340M	Download
Prompt-Depth-Anything-Small	25.1M	Download
Prompt-Depth-Anything-Small-Transparent	25.1M	Download

Only Prompt-Depth-Anything-Large is used to benchmark in our paper. Prompt-Depth-Anything-Small-Transparent is further fine-tuned 10K steps with hammer dataset with our iPhone lidar simulation method to improve the performance on transparent objects.

Usage

Example usage

from promptda.promptda import PromptDA
from promptda.utils.io_wrapper import load_image, load_depth, save_depth

DEVICE = 'cuda'
image_path = "assets/example_images/image.jpg"
prompt_depth_path = "assets/example_images/arkit_depth.png"
image = load_image(image_path).to(DEVICE)
prompt_depth = load_depth(prompt_depth_path).to(DEVICE) # 192x256, ARKit LiDAR depth in meters

model = PromptDA.from_pretrained("depth-anything/promptda_vitl").to(DEVICE).eval()
depth = model.predict(image, prompt_depth) # HxW, depth in meters

save_depth(depth, prompt_depth=prompt_depth, image=image)

Running on your own capture

You can use Stray Scanner App to capture your own data, which requires iPhone 12 Pro or later Pro models, iPad 2020 Pro or later Pro models. We setup a Hugging Face Space for you to quickly test our model. If you want to obtain video results, please follow the following steps.

Testing steps

Capture a scene with the Stray Scanner App. (The charging port is preferred to face downward or to the right.)
Use the iPhone Files App to compress it into a zip file and transfer it to your computer. Here is an example screen recording.
Run the following commands to infer our model and generate the video results.

export PATH_TO_ZIP_FILE=data/8b98276b0a.zip # Replace with your own zip file path
export PATH_TO_SAVE_FOLDER=data/8b98276b0a_results # Replace with your own save folder path
python3 -m promptda.scripts.infer_stray_scan --input_path ${PATH_TO_ZIP_FILE} --output_path ${PATH_TO_SAVE_FOLDER}
python3 -m promptda.scripts.generate_video process_stray_scan --input_path ${PATH_TO_ZIP_FILE} --result_path ${PATH_TO_SAVE_FOLDER}
ffmpeg -framerate 60 -i ${PATH_TO_SAVE_FOLDER}/%06d_smooth.jpg  -c:v libx264 -pix_fmt yuv420p ${PATH_TO_SAVE_FOLDER}.mp4

Acknowledgements

We thank the generous support from Prof. Weinan Zhang for robot experiments, including the space, objects and the Unitree H1 robot. We also thank Zhengbang Zhu, Jiahang Cao, Xinyao Li, Wentao Dong for their help in setting up the robot platform and collecting robot data.

Citation

If you find this code useful for your research, please use the following BibTeX entry

@inproceedings{lin2024promptda,
  title={Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation},
  author={Lin, Haotong and Peng, Sida and Chen, Jingxiao and Peng, Songyou and Sun, Jiaming and Liu, Minghuan and Bao, Hujun and Feng, Jiashi and Zhou, Xiaowei and Kang, Bingyi},
  journal={arXiv},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
promptda		promptda
torchhub		torchhub
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Project Page | Paper | Hugging Face Demo | Interactive Results | Data

Installation

Usage

Running on your own capture

Acknowledgements

Citation

About

Releases

Packages

Languages

License

DepthAnything/PromptDA

Folders and files

Latest commit

History

Repository files navigation

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Project Page | Paper | Hugging Face Demo | Interactive Results | Data

Installation

Usage

Running on your own capture

Acknowledgements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages