The encoder is the part where we convert the image to a 200 D vector which is our latent space. To do this we use 5 convolution layers with BN and ReLU in between
The generator uses the latent space vector to generate the 3D model. It contains 5 ConvTranspose3D layers with BN and ReLU in between the layers
The discriminator takes the 3D Volume as input and predicts whether the 3D object is real or fake(generated model). The architecture of the Discriminator is the mirror of the generator model except there is a sigmoid layer attached at the end
x is a 3D shape from the training set, y is its corresponding 2D image, and q(z|y) is the variational distribution of the latent representation z The loss function consists of three parts: an object reconstruction loss LRecon, a cross-entropy loss L3D-GAN for 3D-GAN, and a KL divergence loss LKL to restrict the distribution of the output of the encoder The Kullback-Leibler Divergence score, or KL divergence score, quantifies how much one probability distribution differs from another probability distribution So we use the KL Divergence score so that we can bring the q(z|y) as close to p(z). Basically, we want q(z|y) to represent a Gaussian distribution.
- IKEA Dataset
- SUN Database
Pytorch Code : Code
Paper : Paper