Skip to content

Training param tuning

Eric Liu edited this page Nov 29, 2018 · 47 revisions

Training param in prototxt

There has two types data augmentation method for different application

https://github.com/eric612/MobileNet-YOLO/issues/29

For example , I will choose adaptive aspect ratio in fisheye videos which were pixel level geometry distortion

keep aspect ratio

  • Set preprocessing resize mode to "FIT_LARGE_SIZE_AND_PAD"
  • Remove all expand param
  • Inference use "FIT_LARGE_SIZE_AND_PAD" resize

Adaptive aspect ratio

This type may break k-mean anchors rule and effect accuracy about 1% in my test

  • Set preprocessing resize mode to "WARP"

  • Expand param set to {VOC:4.0 , COCO:1.5 , ...}

  • Inference use "WARP" resize

  • For advance , modify jitter code

    unmark

    caffe_rng_uniform(1, 1.0f - jitter, 1.0f, &img_h)

    and mark

    img_h = img_w;

Warm up training

If solver type set to "SGD" , you may need set learning rate policy like this

Pre-trained weights and batch size

total_batch_size = iter_size * batch_size

If pre-trained weights use

  • Classification model (like imagenet)

    total_batch_size set to 64 at least

  • Detection model (like ms-coco)

    total_batch_size set to 16 at least ("PASCAL-VOC")

    total_batch_size set to 32 at least , recommend to 64 ("MS-COCO")

Loss do not convergence

You can try below training technology

  • Set larger batch size
  • Set lr_mult:0.1 or 0.2 and also decay_mult at first 1 ~ x layers (x is 10 for MobileNet) , base on your backbone network
  • Changes solver type
Clone this wiki locally