Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to express 3 discrete latent codes (each with dimension 20) and visual work ok? #10

Open
zdx3578 opened this issue Sep 24, 2016 · 11 comments

Comments

@zdx3578
Copy link

zdx3578 commented Sep 24, 2016

how to express 10 dimensional categorical variables

code:
latent_spec = [
(Uniform(62), False),
(Categorical(10), True),
(Uniform(1, fix_std=True), True),
(Uniform(1, fix_std=True), True),
]
is for mnist ,
this is not enough,
in paper:
MNIST, we choose to model the latent codes with one categorical code, c1 ⇠ Cat(K = 10, p = 0.1), which can model discontinuous variation in data, and two continuous codes that can capture variations that are continuous in nature: c2 , c3 ⇠ Unif ( 1, 1).

but what to express: Street View House Number (SVHN
we make use of four 10 dimensional categorical variables and two uniform continuous variables as latent codes.

CelebA
In this dataset, we model the latent variation as 10 uniform categorical variables, each of dimension 10.

append c.3
generator G
Input 2 R228 228 how to get 228?

discriminator D / recognition network Q generator G
Input 32 ⇥ 32 Color image Input 2 R228
4 ⇥ 4 conv. 64 lRELU. stride 2 FC. 2 ⇥ 2 ⇥ 448 RELU. batchnorm
4 ⇥ 4 conv. 128 lRELU. stride 2. batchnorm 4 ⇥ 4 upconv. 256 RELU. stride 2. batchnorm 4 ⇥ 4 conv. 256 lRELU. stride 2. batchnorm 4 ⇥ 4 upconv. 128 RELU. stride 2.
FC. output layer for D,
FC.128-batchnorm-lRELU-FC.output for Q 4 ⇥ 4 upconv. 64 RELU. stride 2.
4 ⇥ 4 upconv. 3 Tanh. stride 2.

any one any help?
thanks very much !

@zdx3578
Copy link
Author

zdx3578 commented Sep 25, 2016

if isinstance(dist, Gaussian):
assert dist.dim == 1, "Only dim=1 is currently supported"
c_vals = []
for idx in xrange(10):
c_vals.extend([-1.0 + idx * 2.0 / 9] * 10)
c_vals.extend([0.] * (self.batch_size - 100))
vary_cat = np.asarray(c_vals, dtype=np.float32).reshape((-1, 1))
cur_cat = np.copy(fixed_cat)
cur_cat[:, offset:offset+1] = vary_cat
offset += 1
elif isinstance(dist, Categorical):
lookup = np.eye(dist.dim, dtype=np.float32)
cat_ids = []
for idx in xrange(10):
cat_ids.extend([idx] * 10)
cat_ids.extend([0] * (self.batch_size - 100))
cur_cat = np.copy(fixed_cat)
cur_cat[:, offset:offset+dist.dim] = lookup[cat_ids]
offset += dist.dim
elif isinstance(dist, Bernoulli):
assert dist.dim == 1, "Only dim=1 is currently supported"

@zdx3578
Copy link
Author

zdx3578 commented Sep 25, 2016

embedding_dim = 100

latent_spec = [
    (Uniform(64), False),
    (Categorical(32), True),
]
con_latent_spec = [
    (LatentGaussian(embedding_dim), True)
]

https://github.com/RutgersHan/InfoGAN/blob/dev_auto/launchers/generate_images.py

@zdx3578
Copy link
Author

zdx3578 commented Oct 1, 2016

C.3 CelebA
The network architectures are shown in Table 3. The discriminator D and the recognition network Q shares most of the network. For this task, we use 10 ten-dimensional categorical code and 128 noise variables, resulting in a concatenated dimension of 228.

is

latent_spec = [
    (Uniform(128), False),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
]

but how to config C.5 Chairs as below

The network architectures are shown in Table 6. The discriminator D and the recognition network Q shares the same network, and only have separate output units at the last layer. For this task, we use 1 continuous latent code, 3 discrete latent codes (each with dimension 20), and 128 noise variables, so the input to the generator has dimension 189.

elif isinstance(dist, Bernoulli):
assert dist.dim == 1, "Only dim=1 is currently supported"

@NHDaly
Copy link

NHDaly commented Oct 1, 2016

The above latent_spec worked okay for me.

c3_celebA_latent_spec = [
    (Uniform(128), False),  # Noise
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
]
c3_celebA_image_size = 32

Can you elaborate a bit more in words what you're having problems with? I'm not sure I understand what's not working for you.

@zdx3578
Copy link
Author

zdx3578 commented Oct 3, 2016

thanks NHDaly ! can share your code?

i think celebA config is ok, now question is about how to config C.5 Chairs as below

The network architectures are shown in Table 6. The discriminator D and the recognition network Q shares the same network, and only have separate output units at the last layer. For this task, we use 1 continuous latent code, 3 discrete latent codes (each with dimension 20), and 128 noise variables, so the input to the generator has dimension 189.

3 discrete latent codes (each with dimension 20) but visual code is below:
elif isinstance(dist, Bernoulli):
assert dist.dim == 1, "Only dim=1 is currently supported"

We used separate configurations for each learned variation, shown in Table 7. For this task, we found it necessary to use different regularization coefficients for the continuous and discrete latent codes.
image

so how to config c.5 chairs,and how to change visual code ? or visual code wont change??

and 2 question is :
in C.4 Faces
The network architectures are shown in Table 4. The discriminator D and the recognition network Q shares the same network, and only have separate output units at the last layer. For this task, we use 5 continuous latent codes and 128 noise variables, so the input to the generator has dimension 133.
We used separate configurations for each learned variation, shown in Table 5.

how to config 'separate configurations for each learned variation' ??

image

@zdx3578 zdx3578 changed the title how to express 10 dimensional categorical variables how to express 3 discrete latent codes (each with dimension 20) and visual work ok? Oct 3, 2016
@NHDaly
Copy link

NHDaly commented Oct 4, 2016

how to config C.5 Chairs as below

I might be misunderstanding, but it seems like

For this task, we use 1 continuous latent code, 3 discrete latent codes (each with dimension 20), and 128 noise variables, so the input to the generator has dimension 189.

would translate to the following latent_spec. That is, the continuous code is represented by Uniform and the discrete code is represented by Categorical:

c5_chairs_latent_spec = [
    (Uniform(128), False),  # Noise
    (Uniform(1, fix_std=True), True),
    (Categorical(20), True),
    (Categorical(20), True),
    (Categorical(20), True),
]
c3_celebA_image_size = 32

I copied the (Uniform(1, fix_std=True), True) line from the two continuous variables defined in run_mnist_exp.py, which I believe represent the "2 continuous latent codes" referenced from the MNIST section of the paper.

I'm not sure where you got the LatentGaussian from... I don't know if it's necessary? I haven't tried running the Chairs model at all.

@NHDaly
Copy link

NHDaly commented Oct 4, 2016

That said, I am also very curious about the answer to this question:

how to config 'separate configurations for each learned variation' ?

Does this mean that you ran the experiment multiple times with the same number of codes, but each of the codes tends to perform best for each of the provided settings?

@neocxi
Copy link
Contributor

neocxi commented Oct 4, 2016

That is, the continuous code is represented by Uniform and the discrete code is represented by Categorical:

This is correct. Thanks @NHDaly !

Does this mean that you ran the experiment multiple times with the same number of codes, but each of the codes tends to perform best for each of the provided settings?

Yes, to better compare with previous supervised results, we select codes from multiple runs that are most similar to categories that previous method (DC-IGN) produces.

@zdx3578
Copy link
Author

zdx3578 commented Oct 4, 2016

for @NHDaly ref this https://github.com/RutgersHan/InfoGAN/blob/dev_auto/launchers/run_flower_exp.py#L49

is your celeba train result is ok?

for @neocxi 1
self.reg_cont_latent_dist = Product([x for x in self.reg_latent_dist.dists if isinstance(x, Gaussian)])
self.reg_disc_latent_dist = Product([x for x in self.reg_latent_dist.dists if isinstance(x, (Categorical, Bernoulli))])
Bernoulli is also discrete. where to use Bernoulli?

2
can @neocxi give a example show which parameter is according to image ?
is info_reg_coeff=1.0, parameter??

3 what cause NAN error? D and G learning rate not equilibrium??

Epoch 14 | discriminator_loss: 0.128064; generator_loss: 2.78964; MI_disc: 20.3559; CrossEnt_disc: 2.66993; MI: 20.3559; CrossEnt: 2.66993; max_real_d: 0.999938; min_real_d: 0.560705; max_fake_d: 0.240968; min_fake_d: 0.0144349
STR: 'avg_log_vals' is [ 1.28064305e-01 2.78963685e+00 2.03559246e+01 2.66993141e+00
2.03559246e+01 2.66993141e+00 9.99938190e-01 5.60704947e-01
2.40968212e-01 1.44348787e-02]
STR: 'ganlp' is 2 |ETA: --:--:--
Epoch 15 | discriminator_loss: nan; generator_loss: nan; MI_disc: nan; CrossEnt_disc: nan; MI: nan; CrossEnt: nan; max_real_d: -inf; min_real_d: inf; max_fake_d: -inf; min_fake_d: inf
STR: 'avg_log_vals' is [ nan nan nan nan nan nan -inf inf -inf inf]
Traceback (most recent call last):
File "launchers/run_mnist_exp.py", line 97, in
algo.train()
File "/home/ubuntu/wordk/InfoGAN/infogan/algos/infogan_trainer.py", line 335, in train
raise ValueError("NaN detected!")
ValueError: NaN detected!

4 celeba train need how long ? epoch log can share ?

log d loss very small g loss bigger

2016-10-07 7 55 16

@zdx3578
Copy link
Author

zdx3578 commented Nov 22, 2016

A.2 INFOGAN TRAINING
To train the InfoGAN network described in Tbl. 1 on the 2D shapes dataset (Fig. 6), we followed the training paradigm described in Chen et al. (2016) with the following modifications. For the mutual information regularised latent code, we used 5 continuous variables ci sampled uniformly from ( 1,1). We used 5 noise variables zi, as we found that using a reduced number of noise variables improved the quality of generated samples for this dataset. To help stabilise training, we used the instance noise trick described in Shi et al. (2016), adding Gaussian noise to the discriminator inputs (0.2 standard deviation on images scaled to [ 1, 1]). We followed Radford et al. (2015) for the architecture of the convolutional layers, and used batch normalisation in all layers except the last in the generator and the first in the discriminator.

from
beta-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK

@simonzhang0158
Copy link

I am wondering if there is any update about CelebA dataset. I have the same problem as @zdx3578 when I am trying to train CelebA. @zdx3578 Did you find a way to solve this? The paper you mentioned above I believe is the set up for 2D shapes dataset.

Any help will be appreciate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants