diff --git a/proj6/images/camera-to-world.png b/proj6/images/camera-to-world.png new file mode 100644 index 0000000..7c7d456 Binary files /dev/null and b/proj6/images/camera-to-world.png differ diff --git a/proj6/images/default_dog.png b/proj6/images/default_dog.png new file mode 100644 index 0000000..f7c9279 Binary files /dev/null and b/proj6/images/default_dog.png differ diff --git a/proj6/images/default_fox.png b/proj6/images/default_fox.png new file mode 100644 index 0000000..a14713b Binary files /dev/null and b/proj6/images/default_fox.png differ diff --git a/proj6/images/fox.jpg b/proj6/images/fox.jpg new file mode 100644 index 0000000..202f5ce Binary files /dev/null and b/proj6/images/fox.jpg differ diff --git a/proj6/images/freq_dog.png b/proj6/images/freq_dog.png new file mode 100644 index 0000000..97260ff Binary files /dev/null and b/proj6/images/freq_dog.png differ diff --git a/proj6/images/freq_fox.png b/proj6/images/freq_fox.png new file mode 100644 index 0000000..daca044 Binary files /dev/null and b/proj6/images/freq_fox.png differ diff --git a/proj6/images/full-red.gif b/proj6/images/full-red.gif new file mode 100644 index 0000000..ecc3e5b Binary files /dev/null and b/proj6/images/full-red.gif differ diff --git a/proj6/images/full.gif b/proj6/images/full.gif new file mode 100644 index 0000000..4c968d8 Binary files /dev/null and b/proj6/images/full.gif differ diff --git a/proj6/images/hidden_dog.png b/proj6/images/hidden_dog.png new file mode 100644 index 0000000..eecad62 Binary files /dev/null and b/proj6/images/hidden_dog.png differ diff --git a/proj6/images/hidden_fox.png b/proj6/images/hidden_fox.png new file mode 100644 index 0000000..e08ad08 Binary files /dev/null and b/proj6/images/hidden_fox.png differ diff --git a/proj6/images/iteration-visualize.png b/proj6/images/iteration-visualize.png new file mode 100644 index 0000000..83d854b Binary files /dev/null and b/proj6/images/iteration-visualize.png differ diff --git a/proj6/images/mlp_img.jpg b/proj6/images/mlp_img.jpg new file mode 100644 index 0000000..8473c1a Binary files /dev/null and b/proj6/images/mlp_img.jpg differ diff --git a/proj6/images/mlp_nerf.png b/proj6/images/mlp_nerf.png new file mode 100644 index 0000000..aa5d654 Binary files /dev/null and b/proj6/images/mlp_nerf.png differ diff --git a/proj6/images/mse.png b/proj6/images/mse.png new file mode 100644 index 0000000..1ff9711 Binary files /dev/null and b/proj6/images/mse.png differ diff --git a/proj6/images/pixel-to-camera.png b/proj6/images/pixel-to-camera.png new file mode 100644 index 0000000..e17a910 Binary files /dev/null and b/proj6/images/pixel-to-camera.png differ diff --git a/proj6/images/pixel-to-ray-1.png b/proj6/images/pixel-to-ray-1.png new file mode 100644 index 0000000..9818f44 Binary files /dev/null and b/proj6/images/pixel-to-ray-1.png differ diff --git a/proj6/images/pixel-to-ray-2.png b/proj6/images/pixel-to-ray-2.png new file mode 100644 index 0000000..bf610c5 Binary files /dev/null and b/proj6/images/pixel-to-ray-2.png differ diff --git a/proj6/images/psnr.png b/proj6/images/psnr.png new file mode 100644 index 0000000..a5eb8a0 Binary files /dev/null and b/proj6/images/psnr.png differ diff --git a/proj6/images/psnr_dog.png b/proj6/images/psnr_dog.png new file mode 100644 index 0000000..018fe17 Binary files /dev/null and b/proj6/images/psnr_dog.png differ diff --git a/proj6/images/psnr_fox.png b/proj6/images/psnr_fox.png new file mode 100644 index 0000000..8270898 Binary files /dev/null and b/proj6/images/psnr_fox.png differ diff --git a/proj6/images/ray-visual-1.png b/proj6/images/ray-visual-1.png new file mode 100644 index 0000000..01329bd Binary files /dev/null and b/proj6/images/ray-visual-1.png differ diff --git a/proj6/images/ray-visual-2.png b/proj6/images/ray-visual-2.png new file mode 100644 index 0000000..ad29b67 Binary files /dev/null and b/proj6/images/ray-visual-2.png differ diff --git a/proj6/images/yorkie.jpg b/proj6/images/yorkie.jpg new file mode 100644 index 0000000..33c5c32 Binary files /dev/null and b/proj6/images/yorkie.jpg differ diff --git a/proj6/index.html b/proj6/index.html new file mode 100644 index 0000000..967da3b --- /dev/null +++ b/proj6/index.html @@ -0,0 +1,685 @@ +cs180: final proj

cs180: final proj

Neural Radiance Fields

Part 1: Fit a Neural Field to a 2D Image

We use Part 1 as a stepping stone to Part 2. Our goal in Part 1 is to create a neural field that can represent a 2D image. To do so, the neural field (a Multilayer Perceptron (MLP) network with Sinusoidal Positional Encoding (PE)) takes in 2D pixel coordinates and outputs the 3D pixel colors for each coordinate. To train the model, I modified the hyperparameters of the neural network: changed hidden layer size to 1024 and highest frequency level of sinusoidal positional encoding to 20. I trained the model for 3000 iterations, using a learning rate of 0.001.

For each image, I’ve included the training PSNR across iterations plot and the training process visualization. I included other experimentations like how increasing the model size improved results but increasing the highest frequency level didn’t impact results much.

From top to bottom: [Hidden Layer Size = 128, L = 10], [Hidden Layer Size = 1024, L = 10], [Hidden Layer Size = 128, L = 20]


fox.jpg

+

dog.jpg

Part 2: Fit a Neural Radiance Field from Multi-View Images

Building from Part 1, we can now use a neural radiance field to represent a 3D space by inverse rendering multi-view calibrated images. Much of this part refers to the techniques from the NeRF paper.

Part 2.1: Create Rays from Cameras

I implemented three functions in this part, all supporting batched coordinates for future use.

The first function (Camera to World Coordinate Conversion) transforms a point from camera space to world space by appending a fourth dimension of ones to the world coordinates and multiplying it by the camera-to-world transformation matrix like the following:

The second function (Pixel to Camera Coordinate Conversion) converts 2D pixel coordinates to 3D points in the camera space by constructing the intrinsic matrix KK with focal lengths (fxf_x,fyf_y) and principal point (oxo_x,oyo_y) and calculating the following:

The third function (Pixel to Ray) generates the ray origin and ray direction for 2D pixel coordinates by calculating the world-to-camera matrix w2cw2c and taking its inverse c2wc2w to use c2wc2w’s upper left 3x3 corner to calculate the following:

Part 2.2: Sampling

I implemented two functions in this part and saved performance time by vectorizing my second function.

The first function (Sampling Rays from Images) is a part of Part 2.3’s dataloader class and converts the pixel coordinates into ray origins and directions. I sample pixels globally from all images, account for the offset from image coordinate to pixel center by adding 0.5, and convert them to rays using Part 2.1’s Pixel to Ray function.

The second function (Sampling Points along Rays) discretize each ray into samples in the 3D space. I added the ray direction multiplied by different distances to the ray origins, ensuring that when perturbation=True, I perturb the boundaries to ensure training touches every location along the ray.

Part 2.3: Putting the Dataloading All Together

With my modified dataloader and using viser, here are some of my verification results (right image is sampling 100 rays, left image is sampling 100 rays across one image):

Part 2.4: Neural Radiance Field

I built the neural radiance field (NeRF) using the network structure below, with the goal of outputting the densities and pixel values when given 3D coordinates (points sampled along ray) and ray directions. The main additions to this structure from Part 1’s are the intermediate injections of the input and splitting the model to output both the density and RGB values.

Part 2.5: Volume Rendering

I implemented the volume rendering function volrend where given the NeRF’s density and RGB values, it computes the loss by comparing the given with the original pixel values. I used torch.cumsum and padded densities with 0 in the front due to the i1i-1 summation term.

I trained my model with the following hyperparameters: adam learning rate of 1e-3, 3600 gradient descent steps and sampling 1024 rays each time with 64 samples along each. Here are some intermediate training images as well as the validation set’s MSE and PSNR. As seen, I was able to achieve above 23 PSNR!

Bells and Whistles: Background Color

To render the video with a background color other than black, I modified my volrend function to multiply Tn+1T_{n+1} by the background color passed into the function (in my case, red) and add it to the pixel color.

Reflection

This project definitely was time-consuming and interesting. I learned a lot and the end results were great but there’s room to improve! I wasn’t able to vectorize everything I wanted, resulting in training taking much longer than expected.

\ No newline at end of file