Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Winograd Layer in Skim Caffe #15

Open
imniksha opened this issue Mar 20, 2018 · 17 comments
Open

Winograd Layer in Skim Caffe #15

imniksha opened this issue Mar 20, 2018 · 17 comments

Comments

@imniksha
Copy link

Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help.
Do not post such requests to Issues. Doing so interferes with the development of Caffe.

Please read the guidelines for contributing before submitting this issue.

Issue summary

Hello, I am new to Caffe and deep learning in general, and am hoping to find some answers here :)

I have installed SkimCaffe on my Ubuntu VM and am able to run classification models using Lenet. Now, I want to switch the convolution layer to winograd convolution layer, and perform comparative study between the two types of convolutions.

I have tried to add it as below, however this has not been successful. The winograd layer addition just zeros all entries in the matrix and gives wrong classifications (refer below). I believe I must be doing something wrong here. I would greatly appreciate if someone could guide me to the solution.

Basically, I want to add a Winograd Layer (winograd convolution) into Lenet, using winograd_layer.cpp.

Thank you for the help! Also, please let me know where I could ask this question, if this is not the right platform for it :)

LENET:
layer {
name: "train-data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"
}
data_param {
source: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db"
batch_size: 64
backend: LMDB
}
}
layer {
name: "val-data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"
}
data_param {
source: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/val_db"
batch_size: 32
backend: LMDB
}
}
layer {
name: "scale"
type: "Power"
bottom: "data"
top: "scaled"
power_param {
scale: 0.0125000001863
}
}
layer {
name: "win1"
type: "Winograd"
bottom: "scaled"
top: "win1"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "win1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "win2"
type: "Winograd"
bottom: "pool1"
top: "win2"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "win2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}

image

@jspark1105
Copy link
Contributor

Sorry about late reply. You're not doing anything wrong. Training directly in Winograd domain was challenging as described in our paper (https://arxiv.org/pdf/1702.08597.pdf . See Section 6.1 for the comment on 200x smaller learning rate). I also didn't have enough time to put all necessary information to reproduce the results in the paper before I left Intel. Anyway, please try it with much smaller learning rate.

@jspark1105
Copy link
Contributor

BTW, If I can understand your goal, I may be able to help you better. Do you want to see how much sparsity you can get in Winograd domain in the model of your interest?

@imniksha
Copy link
Author

Hi, thank you so much for your reply!
Alright, I will try it with a small learning rate and let you know the results.

My goal is as follows:
Do a comparative study between direct convolution methods, and winograd convolution in classification problems in CNNs. Yes, I do want to look at how much sparsity winograd convolution gives. Specifically, I want to compare the number of operations, model size, accuracy and time taken to train for the two types of convolution (if this is possible).
I want to train the same dataset with both convolutions (while keeping all other parameters the same), and see how the results compare.

Any suggestion to do above is welcome!

I am a student and feel really glad to get a reply and appreciate your help :)

@imniksha
Copy link
Author

Just a side note, I see that your paper already has results for this kind of comparative study, but I would like to get some of my own results and learn to study them.

@jspark1105
Copy link
Contributor

Our work on sparse Winograd was by no means complete, especially on the training side because we needed very low learning rate and so on, so any improvements up on that would be really interesting. BTW, we didn't spend that much time on speedup the training (mostly focused on speedup the inference once you get sparsity in Winograd domain), so training in Winograd domain will be slow especially if you're comparing with cudnn that is extensively optimized.

@imniksha
Copy link
Author

I tried to set the learning rate to what you mentioned in your paper (refer below), however I am still unsuccessful in getting results for Winograd convolutions.
What else should I change to at least get some results, instead of zeros for all elements after encountering the winograd layer?

image

image

@imniksha
Copy link
Author

Hello, could you please tell me how to get data for the winograd layer (above comment)? What parameters should I be setting differently to get some output (will try to optimize later). For right now, I just want to make sure I am able to get some readable output. Thank you!

@jspark1105
Copy link
Contributor

Sorry about late reply. Can you tell me the exact command you used and share or point to all necessary files needed like /home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db so that I can reproduce?

@imniksha
Copy link
Author

Good Morning! Thanks for the reply.
I am using DIGITS as the UI to train and build my network. Sorry for the lenet code provided in the first comment, please ignore that. Please use below instead.
/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db is something that DIGITS puts into the net when I run the train command (internal files for DIGITS). I am not providing that as input.

Below is the Net that I provide the system:

LENET:
name: "LeNet"
layer {
name: "train-data"
type: "Data"
top: "data"
top: "label"
include {
stage: "train"
}
data_param {
batch_size: 64
}
}
layer {
name: "val-data"
type: "Data"
top: "data"
top: "label"
include {
stage: "val"
}
data_param {
batch_size: 32
}
}
layer {
name: "scale"
type: "Power"
bottom: "data"
top: "scaled"
power_param {
scale: 0.0125000001863
}
}
layer {
name: "win1"
type: "Winograd"
bottom: "scaled"
top: "win1"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "win1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "win2"
type: "Winograd"
bottom: "pool1"
top: "win2"
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "win2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
inner_product_param {
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
stage: "val"
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
exclude {
stage: "deploy"
}
}
layer {
name: "softmax"
type: "Softmax"
bottom: "ip2"
top: "softmax"
include {
stage: "deploy"
}
}

Below are the Learning Rate Parameters I have tested with:
image

image

@jspark1105
Copy link
Contributor

I'm not familiar with DIGITS and don't have an access to it. Is there a way to reproduce your result without DIGITS?

@imniksha
Copy link
Author

Hmm, I will look into another way of reproducing my results and get back to you. DIGITS is a basically a UI set up to train networks with Caffe. DIGITS just provides an easy to use software for new caffe users.
In the mean time, do you have any documentation/steps on how you got the Winograd experimental results that you presented in your paper? I would appreciate if you could guide me through them, and this will possibly help me solve me issue. I will try to get results the same way you got, and then try with DIGITs.

@jspark1105
Copy link
Contributor

I know what DIGITS is. Just I haven't used it before and don't have an access to it. I just need what input data is used for training and validation. I'm sorry that I don't have much documentation on Winograd experiments because I didn't have much time to wrap up before I left Intel and the experiments were (especially the training part) not entirely successful.

@imniksha
Copy link
Author

I am using the MNIST Handwriting dataset for training and validation. Below is the link to it:
http://yann.lecun.com/exdb/mnist/
Is that what you need? I am sorry if I misunderstood your question.

@jspark1105
Copy link
Contributor

OK, I'll take a look at this weekend. Sorry about the delay again.

@imniksha
Copy link
Author

Sure, thank you!

@jspark1105
Copy link
Contributor

I was not able to get a good accuracy with your prototxt even if I change back Winograd to Convolution (because I really don't know how to get this -> mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"). So, I just tried the lenet mnist example in the Caffe main branch and just changed Convolution to Winograd (see prototxt below).
I'm able to train to 90+% accuracy. Note that I reduced base_lr a lot.
I'm sorry that I'm not able to help much and unfortunately I won't have much time to help in the future as well.

lenet_solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
#base_lr: 0.01
base_lr: 0.000001
momentum: 0.9
#weight_decay: 0.0005
weight_decay: 0.00005

# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
#snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
#solver_mode: CPU
snapshot_prefix: "examples/mnist/mlp_500_300"

lenet_train_test.prototxt

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Winograd"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Winograd"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
    #decay_mult: 1.0
    #kernel_shape_decay_mult: 0.0
    #breadth_decay_mult: 0.0
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
    #decay_mult: 1.0
    #kernel_shape_decay_mult: 0.0
    #breadth_decay_mult: 0.0
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

@imniksha
Copy link
Author

imniksha commented Apr 2, 2018

Hello, Thank you for your time! I used your comment above to try different parameters on my setup, and am able to get results with Winograd! Looks like the issue is solved :)

Appreciate all your help :)

image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants