clean batch normalization should use batch mean and var for training #3

mikowals · 2015-12-30T03:28:00Z

I think you are using noise_std > 0 to separate both clean vs corrupted path as well as training vs eval. This causes a problem because during evaluation batch norm mean and var should always be based on the training example averages while during training batch norm is meant to introduce regularization via noise by using the batch mean and var.

I changed the code so that update_batch_norm only ran during training on the clean path and always normalized with the mean and var of the batch. Like this:

def update_batch_normalization(batch, mean, var, l):
  if not mean or not var:
    mean, var = tf.nn.moments(batch, axes=[0])
  assign_mean = running_mean[l-1].assign(mean)
  assign_var = running_var[l-1].assign(var)
  bn_assigns.append(ewma.apply([running_mean[l-1], running_var[l-1]]))
  with tf.control_dependencies([assign_mean, assign_var]):
    return (batch - mean) / tf.sqrt(var + 1e-12)

I passed a boolean placeholder to the encoder to separate training loops from evaluation loops. Then inside the encoder I used batch_norm to normalize by the running averages outside of training steps.

if training and noise_std == 0.0:
  z = join(update_batch_normalization(z_pre_l, m_l, v_l, l), batch_normalization(z_pre_u, m, v))
elif training:
  z = join(batch_normalization(z_pre_l, m_l, v_l), batch_normalization(z_pre_u, m, v))
else:
   mean = ewma.average(running_mean[l-1])
   var = ewma.average(running_var[l-1])
   z = join(batch_normalization(z_pre_l, m_l, mean, var), batch_normalization(z_pre_u, mean, var))

This may still not be completely right since I was making all examples labeled examples. With this and the variable initialization fix I trained with 60k labeled examples down to 0.59% error.

The text was updated successfully, but these errors were encountered:

rinuboney · 2015-12-30T06:07:46Z

yeah that's how batch normalization is done usually but in the Theano code published along with the paper, I didn't see how training and testing is distinguished. Here the update is done if it's on the clean path.

How different is the accuracy when this change is made?

mikowals · 2015-12-30T23:08:31Z

The accuracy improves from 99.29% to 99.41% by using the batch mean and batch var during training. Those are just single runs but the lower one is pretty far outside the error bounds of the papers results.

i don't think the update of the moving averages is the problem. I think the problem comeis from always returning the normalization based on the running averages when the update code is called. These lines:

if avg_mean and avg_var:
  return (batch - avg_mean) / tf.sqrt(avg_var + 1e-10)

The pseudocode on page 5 of the paper also looks to me like it is the _batch_ mean and var are used to normalize in the decoding step.

it is probably enough just to make sure update_batch_normalization always returns normalization based on the batch during training steps. I was originally concerned that the evaluation data could also be impacting the running averages but I think because the control dependency is placed on the training step that the moving average updates never actually happen from the eval data.

rinuboney · 2015-12-31T15:30:38Z

I have updated the code. I'm testing it but it takes too long on my laptop. It would be great if you confirm that the code now produces the results presented in the paper.

rinuboney · 2016-01-04T13:26:56Z

Even after making the changes in variable initialization, learning rate and batch norm, the accuracy doesn't improve over 99.29%. @mikowals did you make any other changes?

Also, in the last line of the code you posted above, it's supposed to be

z = join(batch_normalization(z_pre_l, mean, var), batch_normalization(z_pre_u, mean, var))

mikowals · 2016-01-06T00:09:13Z

Looking at the code on the master now, have you set validation_size = 0 in input_data.py so that all examples get used for training?

I have fixed the error pointed out above and am rerunning the code that previously got 99.41% accuracy to see if it was some sort of accident. I will report back.

rinuboney · 2016-01-07T11:45:33Z

I hadn't set the validation set size to 0 but even after making the correction I get almost the same results. I'll verify it again. I found a difference in the updation of running_mean and running_var in the original implementation. I thought the difference in results may be because of that but if you are able to get 99.41% accuracy then obviously it isn't. Are you able to reproduce the result?

mikowals · 2016-01-08T00:04:36Z

The final accuracy was 99.33%. I got that result on 2 training runs.

If I put my typo back (batch_normalization(z_pre_l, m_l, mean, var)) the model does train to 99.41. My interpretation is that that line of code should only be impacting the evaluation encoding of the first 100 examples. But after training I changed the code back to the corrected version and added a couple more training steps. The model continued to get 99.41 on the test results for a few steps. So somehow that change appears to have impacted the trained parameters.

I am lost as to why the clean, labelled path impacts training and why this implementation is not able to match the papers results.

mikowals · 2016-01-08T00:21:56Z

I wonder if the remaining difference is the implementation of Adam in Tensorflow vs Blocks. Blocks has different defaults and also an extra decay term that is not available in Tensorflow. The accuracies don't really appear stable after 150 epochs and for me bounce in a range betweeen 99.25 and 99.45 from about 50 epochs onwards.

rinuboney · 2016-01-08T01:55:41Z

The reported error rate for full labelled setting is 0.608 ± 0.013 which means the 99.41% accuracy you obtained concurs with the results of the paper. When I run the code, the accuracy never goes above 99.29%. The first 99% appears after 70 epochs and it bounces in a range between 99 and 99.29 after 100 epochs. I wonder what's different between our implementations.

rinuboney · 2016-01-08T02:09:05Z

So, without the typo, there is no significant difference in accuracy after distinguishing between training and testing? Actually, I didn't notice any separation between training and testing in the original implementation. I think I'll also try out the update method for running_mean and running_var used in the original implementation rather than using tf.train.ExponentialMovingAverage .

mikowals · 2016-03-07T01:07:45Z

Apparently using a placeholder for a conditional in a tensorflow graph does not work with a simple if - see http://stackoverflow.com/a/35833133/728291. Using the placeholder with tf.cond, as done in batch normalization here, looks like the right way.

rinuboney · 2016-03-10T11:03:06Z

I had that doubt earlier but when I tried it out, it was working. Let me check again.

rinuboney · 2016-03-13T11:50:06Z

Yes a simple if doesn't work. I've updated the code. Now, I get a better than earlier error rate of 1.25%

rinuboney closed this as completed Jan 2, 2016

rinuboney reopened this Jan 4, 2016

mikowals mentioned this issue Jan 6, 2016

Labelled / unlabelled data split by batch_size=100 during evaluation #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clean batch normalization should use batch mean and var for training #3

clean batch normalization should use batch mean and var for training #3

mikowals commented Dec 30, 2015

rinuboney commented Dec 30, 2015

mikowals commented Dec 30, 2015

rinuboney commented Dec 31, 2015

rinuboney commented Jan 4, 2016

mikowals commented Jan 6, 2016

rinuboney commented Jan 7, 2016

mikowals commented Jan 8, 2016

mikowals commented Jan 8, 2016

rinuboney commented Jan 8, 2016

rinuboney commented Jan 8, 2016

mikowals commented Mar 7, 2016

rinuboney commented Mar 10, 2016

rinuboney commented Mar 13, 2016

clean batch normalization should use batch mean and var for training #3

clean batch normalization should use batch mean and var for training #3

Comments

mikowals commented Dec 30, 2015

rinuboney commented Dec 30, 2015

mikowals commented Dec 30, 2015

rinuboney commented Dec 31, 2015

rinuboney commented Jan 4, 2016

mikowals commented Jan 6, 2016

rinuboney commented Jan 7, 2016

mikowals commented Jan 8, 2016

mikowals commented Jan 8, 2016

rinuboney commented Jan 8, 2016

rinuboney commented Jan 8, 2016

mikowals commented Mar 7, 2016

rinuboney commented Mar 10, 2016

rinuboney commented Mar 13, 2016