Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the model-dot-fit function? #15

Open
JayPhate opened this issue Jan 31, 2017 · 3 comments
Open

Where is the model-dot-fit function? #15

JayPhate opened this issue Jan 31, 2017 · 3 comments

Comments

@JayPhate
Copy link

JayPhate commented Jan 31, 2017

I want to use Encoder-Decoder model for some other data. I am trying to understand this code. But I couldn't find the fit method in train.ipynb. After padding of description and heading, how to use these vector for training the model. What is the dimension for X and Y in model-dot-fit? The dimension of X may be #descriptions x 50 and the dimension of Y may be #headings x 50. And #descriptions equals to #headings.

Below is the command I used to fit the model.
model_fit = model.fit(nxTrain, nyTrain, nb_epoch=1, batch_size=64, verbose=2)
The dimensions of X and Y of model.fit method.
xTrain.shape
(17853, 50)

yTrain.shape
(17853, 25)

But I got below error.
Exception: Error when checking model target: expected activation_1 to have 3 dimensions, but got array with shape (17853, 25)

Please check the model summary.
print(model.summary())


Layer (type) Output Shape Param # Connected to

embedding_1 (Embedding) (None, 50, 100) 4000000 embedding_input_1[0][0]


lstm_1 (LSTM) (None, 50, 512) 1255424 embedding_1[0][0]


dropout_1 (Dropout) (None, 50, 512) 0 lstm_1[0][0]


lstm_2 (LSTM) (None, 50, 512) 2099200 dropout_1[0][0]


dropout_2 (Dropout) (None, 50, 512) 0 lstm_2[0][0]


lstm_3 (LSTM) (None, 50, 512) 2099200 dropout_2[0][0]


dropout_3 (Dropout) (None, 50, 512) 0 lstm_3[0][0]


simplecontext_1 (SimpleContext) (None, 25, 944) 0 dropout_3[0][0]


timedistributed_1 (TimeDistribut (None, 25, 40000) 37800000 simplecontext_1[0][0]


activation_1 (Activation) (None, 25, 40000) 0 timedistributed_1[0][0]

Total params: 47253824


None

I used the same model as explained in train.ipynb. I am not getting what's wrong here?

@udibr
Copy link
Owner

udibr commented Jan 31, 2017

  1. nyTrain should be renamed to yTrain because it is the matrix itself not its size (same for nxTrain)
  2. the notebook is using the inefficient loss categorical_crossentropy which needs the label of the words to be expanded to the size of the vocabulary. Look for usage of np_utils.to_categorical in the notebook if you want you can switch to sparse_categorical_crossentropy which does not require the huge memory needed to keep the yTrain in an expanded one-hot form. However I think that in that case you need to add an extra dimension of size 1 to yTrain. For example by doing np.expand_dims(yTrain,-1)

HTH, Udi

@udibr
Copy link
Owner

udibr commented Jan 31, 2017

you are welcomed to do the changes and send a PR

@JayPhate
Copy link
Author

JayPhate commented Feb 1, 2017

@udibr I am very new to Keras and NN. Why do we need to use np_utils.to_categorical? Because I already converted all the vocab words to indices. So I am not using words for training and I am using only indices.
I am trying to build Many to Many sequence labeling model. The fifth model from left in below image.

seq_labeling_modules

My issue is very similar to https://github.com/fchollet/keras/issues/2654 . From this link, I didn't understand the input and output shapes of the data.

Again you suggested to add an extra dimension of size 1 to yTrain, Can you elaborate why? Now the shape of yTrain is => (17853, 25), what will be the shape after adding an extra dimension?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants