-
Notifications
You must be signed in to change notification settings - Fork 64
Winograd Layer in Skim Caffe #15
Comments
Sorry about late reply. You're not doing anything wrong. Training directly in Winograd domain was challenging as described in our paper (https://arxiv.org/pdf/1702.08597.pdf . See Section 6.1 for the comment on 200x smaller learning rate). I also didn't have enough time to put all necessary information to reproduce the results in the paper before I left Intel. Anyway, please try it with much smaller learning rate. |
BTW, If I can understand your goal, I may be able to help you better. Do you want to see how much sparsity you can get in Winograd domain in the model of your interest? |
Hi, thank you so much for your reply! My goal is as follows: Any suggestion to do above is welcome! I am a student and feel really glad to get a reply and appreciate your help :) |
Just a side note, I see that your paper already has results for this kind of comparative study, but I would like to get some of my own results and learn to study them. |
Our work on sparse Winograd was by no means complete, especially on the training side because we needed very low learning rate and so on, so any improvements up on that would be really interesting. BTW, we didn't spend that much time on speedup the training (mostly focused on speedup the inference once you get sparsity in Winograd domain), so training in Winograd domain will be slow especially if you're comparing with cudnn that is extensively optimized. |
Hello, could you please tell me how to get data for the winograd layer (above comment)? What parameters should I be setting differently to get some output (will try to optimize later). For right now, I just want to make sure I am able to get some readable output. Thank you! |
Sorry about late reply. Can you tell me the exact command you used and share or point to all necessary files needed like /home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db so that I can reproduce? |
Good Morning! Thanks for the reply. Below is the Net that I provide the system: LENET: |
I'm not familiar with DIGITS and don't have an access to it. Is there a way to reproduce your result without DIGITS? |
Hmm, I will look into another way of reproducing my results and get back to you. DIGITS is a basically a UI set up to train networks with Caffe. DIGITS just provides an easy to use software for new caffe users. |
I know what DIGITS is. Just I haven't used it before and don't have an access to it. I just need what input data is used for training and validation. I'm sorry that I don't have much documentation on Winograd experiments because I didn't have much time to wrap up before I left Intel and the experiments were (especially the training part) not entirely successful. |
I am using the MNIST Handwriting dataset for training and validation. Below is the link to it: |
OK, I'll take a look at this weekend. Sorry about the delay again. |
Sure, thank you! |
I was not able to get a good accuracy with your prototxt even if I change back Winograd to Convolution (because I really don't know how to get this -> mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"). So, I just tried the lenet mnist example in the Caffe main branch and just changed Convolution to Winograd (see prototxt below). lenet_solver.prototxt
lenet_train_test.prototxt
|
Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help.
Do not post such requests to Issues. Doing so interferes with the development of Caffe.
Please read the guidelines for contributing before submitting this issue.
Issue summary
Hello, I am new to Caffe and deep learning in general, and am hoping to find some answers here :)
I have installed SkimCaffe on my Ubuntu VM and am able to run classification models using Lenet. Now, I want to switch the convolution layer to winograd convolution layer, and perform comparative study between the two types of convolutions.
I have tried to add it as below, however this has not been successful. The winograd layer addition just zeros all entries in the matrix and gives wrong classifications (refer below). I believe I must be doing something wrong here. I would greatly appreciate if someone could guide me to the solution.
Basically, I want to add a Winograd Layer (winograd convolution) into Lenet, using winograd_layer.cpp.
Thank you for the help! Also, please let me know where I could ask this question, if this is not the right platform for it :)
LENET:
layer {
name: "train-data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"
}
data_param {
source: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db"
batch_size: 64
backend: LMDB
}
}
layer {
name: "val-data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"
}
data_param {
source: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/val_db"
batch_size: 32
backend: LMDB
}
}
layer {
name: "scale"
type: "Power"
bottom: "data"
top: "scaled"
power_param {
scale: 0.0125000001863
}
}
layer {
name: "win1"
type: "Winograd"
bottom: "scaled"
top: "win1"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "win1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "win2"
type: "Winograd"
bottom: "pool1"
top: "win2"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "win2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1.0
}
param {
lr_mult: 2.0
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
The text was updated successfully, but these errors were encountered: