This repository is the official implementation of the CVPR'24 paper titled "XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold".
Guangyu Wang, Jinzhi Zhang, Fan Wang, Ruqi Huang, Lu Fang
We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. The core is to unleash the expressivity of hash-based featurization by explicitly prioritizing the sparse manifold. Our method demonstrates state-of-the-art results on various challenging real-world scenes, effectively representing highly detailed contents independent of the geometric resolution.
(a) UV-based featurizations tend to disorganize the feature distribution due to the inevitable distortions in surface parametrization. (b) Existing 3D-surface-based featurizations fail to express the sub-primitive-scale intricate details given the limited discretization resolution. (c) Volumetric featurizations inevitably yield a dispersed weight distribution during volume rendering, where many multi-view inconsistent yet highly weighted samples ambiguate surface colour and deteriorate surface features with inconsistent colour gradient. (d) Our method leverages hash encoding to unlock the dependence of featuremetric resolution on discretization resolution, while simultaneously utilizes rasterization to fully unleash the expressivity of volumetric hash encoding by propagating clean and multi-view consistent signals to surface features.
2024-04-01: Paper and code release.
2024-02-27: Accepted to CVPR 2024.
The code was tested on:
- Ubuntu 20.04, Python 3.9.16, CUDA 11.4, GeForce RTX 3090
Create the environment and install dependencies using conda and pip:
conda env create -f environment.yml
conda activate xscalenvs
This implementation is built upon pytorch, tinycudann, and nvdiffrast.
Carefully collect the multi-view images, since reconstruction quality is 100% tied to view sampling density. Compared to algorithmic improvements, denser coverage of the scene is always the most straight-forward yet effective way to boost the performance. Please refer to this Guide for advanced capture tutorials.
Estimate per-image camera parameters and reconstruct the dense geometry (in the form of triangle mesh) of the scene. Here we recommond to use off-the-shelf software Agisoft Metashape or COLMAP to finish this step:
-
Using Agisoft Metashape:
- Command-line Interface: specify the configs (including file names and parameters for reconstruction) and then run by
python -u scripts/run_metashape.py
. The default parameters generally work well for most real-world scenes. - GUI: follow the Basic Workflow in Metashape Guide - DEV Studio
- Command-line Interface: specify the configs (including file names and parameters for reconstruction) and then run by
-
Using COLMAP: Please refer to the official documentation for command-line interface or GUI.
After photogrammetry, export the undistorted images, camera parameters, and the reconstructed mesh model YourMeshName.MeshEXT
in the folder SCENE_NAME
as:
SCENE_NAME
├── images_{dsp_factor}
│ ├── IMGNAME1.JPG
│ ├── IMGNAME2.JPG
│ └── ...
├── cams_{dsp_factor}
│ ├── IMGNAME1_cam.txt
│ ├── IMGNAME2_cam.txt
│ └── ...
└── YourMeshName.MeshEXT
Note that the mesh can be any formats supported by trimesh, i.e., MeshEXT
can be the commonly used .ply
, .obj
, etc.
The camera convention strictly follows MVSNet, where the camera parameters are defined in the .txt
file, with the extrinsic E = [R|t]
and intrinsic K
being expressed as:
extrinsic
E00 E01 E02 E03
E10 E11 E12 E13
E20 E21 E22 E23
E30 E31 E32 E33
intrinsic
K00 K01 K02
K10 K11 K12
K20 K21 K22
-
For Agisoft Metashape, convert the resulting metashape camera file
cams.xml
to thecams
folder usingscripts/xml2txt.py
, where the following parameters are needed to be specified:-
dsp_factor
: the down-sample rate, e.g.,dsp_factor=4
means down-sampling the resulting images and the related intrinsic parameters by a factor of 4. -
subject_file
: the root path contains the exported image folderimages
andcams.xml
.
The outputs are the two folders namely
images_{dsp_factor}
andcams_{dsp_factor}
. -
-
For COLMAP, please refer to MVSNet/mvsnet/colmap2mvsnet.py.
All configs for the sebsequent neural rendering pipeline are stored in configs/parameter.py
. Make sure to properly set the following parameters before running the code:
-
Root configs (
root_params
):exp_id
: The ID for the current run.root_file
: The root path to store the training logs, checkpoints, and also the render results.load_checkpoint_dir
: The absolute path to load the specified ckpt for inference or further training. Set asNone
when training from scratch.
-
Model hyperparameters (
network_params
):- The default values have been optimized for most real-world large scenes.
-
Batch sizes (
cluster_params
):random_view_batch_size
: How many views to be sampled at once during training.training_batch_size
: How many rays to be sampled at once for each view during training. Depending on the memory, decrease it if the available memory is less than 24G.infer_batch_size
: How many rays to be sampled at once for rendering a single image. Depending on the memory, decrease it if the available memory is less than 24G, or increase it to #rays for the desired render resolution (e.g., 2073600=1080*1920 for 1080p rendering) when having enough memory.- Other parameters are related to the dynamic ray loading mechanism for training and have been optimized for the best results.
-
Rasterization-related parameters (
render_params
):- The default values can always be fixed for good results.
-
Data loading configs (
load_params
):datasetFolder
: Set as the root data path.modelName
: Set as the SCENE_NAME. The folderimages_{dsp_factor}
andcams_{dsp_factor}
, and the meshYourMeshName.MeshEXT
should be saved indatasetFolder/modelName/..
meshName
: Set as YourMeshName.MeshEXT. Note that the mesh can be any formats supported by trimesh. The folderimages_{dsp_factor}
andcams_{dsp_factor}
, and the meshYourMeshName.MeshEXT
should be saved indatasetFolder/modelName/..
all_view_list
: Specify the list of view_id (from 0 to the total number of the images / cameras) to be included fromimages_{dsp_factor}
andcams_{dsp_factor}
.test_view_list
: Specify the list of view_id to be held out for testing.
This functionality is developed to enable training on high-resolution (e.g., 8K) images, by pre-caching the sliced rasterization buffers in disk. Run by:
bash graphs/warping/warp.sh
The optimization is done by iteratively sampling a random batch of cached rays and performing stochastic gradient descent with L1 photometric loss. Use the following script to start training:
bash agents/adap.sh
After training, render the test views by:
python -u agents/render.py
The current release only supports inference on the specified test views. Scripts for free-viewpoint rendering will be integrated soon.
- Add free-viewpoint rendering demo scripts, supported by Blender.
- Integration into nerf-studio.
- Release of the GigaNVS Dataset.
Please cite our paper:
@InProceedings{Wang_2024_CVPR,
author = {Wang, Guangyu and Zhang, Jinzhi and Wang, Fan and Huang, Ruqi and Fang, Lu},
title = {XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {21029-21039}
}