-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
287 lines (268 loc) · 14.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<!-- Bootstrap CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css"
integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<title>SPARF</title>
</head>
<body>
<div class="container">
<br>
<div style="text-align: center;">
<h1>SPARF</h1>
<h3>Neural Radiance Fields from Sparse and Noisy Poses</h3>
<div style="margin-top: 15px;">
<span style="margin-right: 15px; font-size: 1.3em;"><a href="https://prunetruong.com/" target="_blank">Prune Truong<sup>1,2</sup></a></span>
<span style="margin-right: 15px; font-size: 1.3em;"><a href="http://www.lix.polytechnique.fr/Labo/Marie-Julie.RAKOTOSAONA/" target="_blank">Marie-Julie Rakotosaona<sup>2</sup></a></span>
<span style="margin-right: 15px; font-size: 1.3em;"><a href="https://campar.in.tum.de/Main/FabianManhardt" target="_blank">Fabian Manhardt<sup>2</sup></a></span>
<span style="margin-right: 15px; font-size: 1.3em;"><a href="https://federicotombari.github.io/" target="_blank">Federico Tombari<sup>2,3</sup></a></span
</div>
<div style="margin-top: 15px;">
<span style="margin-right: 20px; font-size: 1.2em;"><sup>1</sup>ETH Zurich</span>
<span style="margin-right: 20px; font-size: 1.2em;"><sup>2</sup>Google</span>
<span style="font-size: 1.2em;"><sup>3</sup>Technical University of Munich</span>
</div>
<div>
<span style="margin-right: 10px; font-size: 1.3em;">CVPR 2023 - Highlight</span>
</div>
</div>
<div class="text-center" style="font-size: 1.5em; margin-top: 25px;">
<a class="btn btn-primary btn-lg" target="_blank"
href="https://arxiv.org/abs/2211.11738" role="button"
style="margin-right: 10px; margin-bottom: 10px;">Arxiv</a>
<a class="btn btn-primary btn-lg" target="_blank"
href="https://github.com/google-research/sparf" role="button"
style="margin-right: 10px; margin-bottom: 10px;">Code</a>
<a class="btn btn-primary btn-lg" target="_blank"
href="https://www.youtube.com/embed/_s3_p2Brd_8" role="button"
style="margin-right: 10px; margin-bottom: 10px;">Video</a>
<a class="btn btn-primary btn-lg" target="_blank"
href="https://drive.google.com/file/d/1uLsUZbOEf9DqfxY7xEU2urlJjGdfwKkI/view?usp=sharing" role="button"
style="margin-right: 10px; margin-bottom: 10px;">Poster</a>
<a class="btn btn-primary btn-lg" target="_blank"
href="https://www.youtube.com/embed/ARMKrcJlULE" role="button"
style="margin-right: 10px; margin-bottom: 10px;">Teaser Video</a>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="number_of_views_teaser_lr.mp4" type="video/mp4">
</video>
</div>
<p class="text-center" style="font-size: 1.5em;">
<span style="font-weight: bold;">SPARF</span> enables realistic view synthesis from as few as <span style="font-weight: bold;">2 input
images with noisy poses</span>.
</p>
</div>
</div>
<div style="margin-top: 30px;">
<h2 class="text-center">
Abstract
</h2>
<p style="font-style: italic; margin-bottom: 5px;" class="text-left">
Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize
photorealistic novel views. While showing impressive performance, it relies on the availability
of dense input views with highly accurate camera poses, thus limiting its application in
real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF),
to address the challenge of novel-view synthesis given only few wide-baseline input images
(as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints
in order to jointly learn the NeRF and refine the camera poses. By relying on pixel
matches extracted between the input views, our multi-view correspondence objective
enforces the optimized scene and camera poses to converge to a global and geometrically
accurate solution. Our depth consistency loss further encourages the reconstructed scene
to be consistent from any viewpoint. Our approach sets a new state of the art in the
sparse-view regime on multiple challenging datasets.
</p>
<p style="font-size: 1.2em; margin-top: 0px;" class="text-left">
<span style="font-weight: bold;" class="text-left">TL;DR:</span> We include additional geometric constraints during the NeRF
optimization to enable learning a meaningful geometry and rendering realistic novel views, given only 2 or 3
wide-baseline input views with noisy poses.
</p>
</div>
<p style="font-size: 1.2em; margin-top: 0px;" class="text-left"><span style="font-weight: bold;color: red" class="text-left">
We are honored to be featured in the <a href=https://www.rsipvision.com/ComputerVisionNews-2023May/4/>May edition</a> of the Computer Vision News and in the <a href="https://www.rsipvision.com/CVPR2023-Tuesday/2/">CVPR special</a>! Check out the articles. Big thanks to Ralph Anzarouth from <a href=https://www.rsipvision.com/computer-vision-news/>Computer Vision News</a>!</li>
</span>
</p>
<div style="margin-top:10px;">
<h2 class="text-center">
Teaser Video
</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/ARMKrcJlULE" allowfullscreen></iframe>
</div>
</div>
<div style="margin-top: 50px;">
<div class="text-center">
<h2>
Method Overview
</h2>
<img src="method_fig.png" width=100% class="img-fluid" alt="Responsive image">
</div>
<div style="margin-top: 35px;" class="text-left">
<p>
Most previous NeRF-based approaches for joint pose-NeRF training optimizes the reconstruction loss
for a given set of input images. For sparse inputs, however, this leads to degenerate solutions.
In this work, we propose <span style="font-weight: bold;">SPARF</span>, an approach for joint
pose-NeRF training specifically designed to tackle the challenging scenario of
<span style="font-weight: bold;">few input images</span> with
<span style="font-weight: bold;">noisy camera pose</span> estimates. We first rely on a pretrained
dense correspondence network to extract matches between the training views (step 1). Our
<span style="font-weight: bold;">multi-view
correspondence loss</span> (step 2) minimizes the re-projection error between matches, i.e. it enforces each
pixel of a particular training view to project to its matching pixel in another training view.
We use the rendered NeRF depth and the current pose estimates to backproject
each pixel in 3D space. This constraint hence encourages the learned scene and pose estimates to
converge to a global and accurate geometric solution, consistent across all training views.
Our <span style="font-weight: bold;">depth consistency loss</span> (step 3) further uses the rendered depths from the training viewpoints
to create pseudo-depth supervision for unseen viewpoints, thereby encouraging the
reconstructed scene to be consistent from any direction.
</p>
</div>
</div>
<div style="margin-top:50px;">
<h2 class="text-center">
Results
</h2>
<h4 class="text-left" >View Synthesis on DTU from 3 Input Views with Noisy Camera Poses</h4>
<p class="text-left">
DTU is composed of complex object-centric scenes, with wide-baseline views spanning a half-hemisphere.
We only have access to <span style="font-weight: bold;">3 input views</span>, along with <span style="font-weight: bold;">initial noisy camera poses</span>. We create these
initial poses by synthetically perturbing the ground-truth camera poses with 15% of Gaussian noise.
This leads to an initial rotation error of 15, and an initial translation error equal to 15% of the scene scale.
<span style="font-weight: bold;">Our approach SPARF is the only one producing realistic novel-view renderings with a meaningful geometry!</span>
</p>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_scan21.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_scan40.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_scan41.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_scan45.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_scan82.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_scan114.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<h4 class="text-left" >View Synthesis on LLFF from 3 Input Views with Noisy Camera Poses </h4>
<p class="text-left" >
LLFF is a forward-facing dataset, depicting complex indoor and outdoor scenes.
We only have access to <span style="font-weight: bold;">3 input views</span>, along with initial <span style="font-weight: bold;">identity poses</span>.
As before, our approach SPARF leads to much better-quality novel-view renderings, and learns an accurate geometry.
</p>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_horns.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_leaves.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<h4 class="text-left" >View Synthesis on Replica from 3 Input Views with Noisy Camera Poses </h4>
<p class="text-left" >
Replica is a non-forward facing dataset, depicting indoor scenes.
We only have access to <span style="font-weight: bold;">3 input views</span>, along with <span style="font-weight: bold;">noisy poses initialized by COLMAP</span>.
The scene geometry rendered by our approach SPARF is more accurate and realistic than competitor works BARF, DS-NeRF and SCNeRF. It contains significantly less floaters and inconsistencies.
</p>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_office0.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-sm-12 col-xs-12 gallery">
<div class="embed-responsive embed-responsive-21by9">
<video controls loop muted autoplay class="embed-responsive-item">
<source src="videos/subset_3/comparison_room2.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
<div>
<h2 class="text-center" style="margin-top: 30px;">
Citation
</h2>
<p class="text-left">
If you want to cite our work, please use:
</p>
<pre class="text-left">
@InProceedings{sparf2023,
title={SPARF: Neural Radiance Fields from Sparse and Noisy Poses},
author = {Truong, Prune and Rakotosaona, Marie-Julie and Manhardt, Fabian and Tombari, Federico},
publisher = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
year = {2023}
</pre>
</div>
<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js"
integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN"
crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js"
integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q"
crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"
integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl"
crossorigin="anonymous"></script>
</body>
</html>