-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions to tackle this MemoryError #6
Comments
It looks like you run out of RAM on your Amazon instance. You can open a terminal and use htop to see this happening while the code is running. |
Hello @udibr , you are right, I ran out of RAM on my AMI (15GB). I reduced the dataset to half-size (750K) but I got exactly the same error, at the same point. It is always at the end of 2nd iteration (epoch). The line is For now I am going to run it again with 150K samples and see what happens. Thanks 👍 👍 EDIT: with 150K samples I am getting the same error. I have also tried seeting |
try training with nflips=0 it may help |
Ok. You mean Thanks |
thats ok or change the value assigned to nflips in cell 9 of train.ipynb |
Thanks, I tried it but unfortunately it crashed in the second iteration again. However, this time did not crash at the end of it: TRAIN I am going to try it with TensorFlow backend. Do you think it may be a good idea? |
I set Thanks a lot for your help @udibr :) |
I'm using Theano so there could be bugs related to TF which I did not noticed. |
Hello, I am using ami-125b2c72 (g2.2xlarge) with spot price as you suggested in another issue (thanks a lot). After struggling a bit with CUDA drivers I finally got to run some epochs and I am able to save and load all the training files from S3. Now, I have 1441135 examples. I trained one epoch, saved weights, stop ami, re-run script, load weights, train one more epoch and then crashed. I got this output. I wonder if you, @udibr , could give me some ideas or intuitions about what is my problem. One of my questions is, is the memory error about regular RAM or GPU memory? (maybe I could use another AMI).I also got the warning about Epoch comprised more than
samples_per_epoch
samples, but I am not sure if I should do anything about it.`ubuntu@ip-xxxxxx:~/auris$ python train2.py
Using Theano backend.
Using gpu device 0: GRID K520 (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 4007)
READING WORD EMBEDDING
('/home/ubuntu/auris//en3_vocabulary-embedding.pkl', ' already downloaded')
('/home/ubuntu/auris//en3_vocabulary-embedding.data.pkl', ' already downloaded')
number of examples 1441135 1441135
dimension of embedding space for words 100
vocabulary size 40000 the last 10 words can be used as place holders for unknown/oov words
total number of different words 1481094 1481094
number of words outside vocabulary which we can substitue using glove similarity 208580
number of words that will be regarded as unknonw(unk)/out-of-vocabulary(oov) 1232514
H: Vuwani schools damage prompts call for new law to punish vandals
D: Department officials on Tuesday briefed MPs on the recovery plans for the protest-ravaged^ Vuwani area in Limpopo . Earlier in May Vuwani residents protested against the creation of a new municipality which will include Malamulele^ residents . The violent protests led to the torching of several schools and other public amenities in the area which disrupted classes .
H: Kathmandu Post- Mitsubishi Motors admits cheating fuel tests since 1991
D: Mitsubishi 's eK^ Wagon^ was one of the models affected Reuters Apr 26 , 2016- Mitsubishi Motors has admitted to falsifying some fuel consumption tests since 1991 . The admission follows last week 's revelation that it had falsified fuel economy data for more than 600,000 vehicles sold in Japan . 'For the domestic market , we have been using that method since 1991 , ' said vice-president Ryugo^ Nakao^ at a press conference in Tokyo on Tuesday .
MODEL
0 cls=Embedding name=embedding_1
40000x100
1 cls=LSTM name=lstm_1
100x512 512x512 512 100x512 512x512 512 100x512 512x512 512 100x512 512x512 512
2 cls=Dropout name=dropout_1
3 cls=LSTM name=lstm_2
512x512 512x512 512 512x512 512x512 512 512x512 512x512 512 512x512 512x512 512
4 cls=Dropout name=dropout_2
5 cls=LSTM name=lstm_3
512x512 512x512 512 512x512 512x512 512 512x512 512x512 512 512x512 512x512 512
6 cls=Dropout name=dropout_3
7 cls=SimpleContext name=simplecontext_1
8 cls=TimeDistributed name=timedistributed_1
944x40000 40000
9 cls=Activation name=activation_1
LOAD
downloading train.hdf5 to /home/ubuntu/auris/train.hdf5:
downloaded /home/ubuntu/auris/train.hdf5
Weights downloaded
TEST
....
....
H: ~ Kate Beckinsale and teenage daughter text naked pictures of Michael Sheen to each other to <0>^ themselves up ' _ _ _ _ _
D: the Port Talbot actor to each other . The underworld star , who lives in the States with 17-year-old Lily , made the odd revelation
TRAIN
Iteration 0
Epoch 1/1
29952/30000 [============================>.] - ETA: 1s - loss: 7.7543/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/training.py:1403: UserWarning: Epoch comprised more than 'samples_per_epoch' samples, which might affect learning results. Set 'samples_per_epoch' correctly to avoid this warning.
30016/30000 [==============================] - 746s - loss: 7.7548 - val_loss: 7.8132
('Uploaded ', '/home/ubuntu/auris/train.history.pkl', ' succesfully')
('Uploaded ', '/home/ubuntu/auris/train.hdf5', ' succesfully')
HEAD: A Python^ Bit^ A Man 's P
DESC: The man fought to remove
HEADS:
34.5337700502 Syrian , Attaporn^ Attaporn^ at , to
43.0280796466 Former wife.She^ : for hour.Eventually^ wife.She^ to in wife.She^ wife.She^
Iteration 1
Epoch 1/1
29952/30000 [============================>.] - ETA: 1s - loss: 7.7520Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/ubuntu/anaconda2/lib/python2.7/threading.py", line 754, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 404, in data_generator_task
generator_output = next(generator)
File "train2.py", line 498, in gen
yield conv_seq_labels(xds, xhs, nflips=nflips, model=model, debug=debug)
File "train2.py", line 459, in conv_seq_labels
y = np.zeros((batch_size, maxlenh, vocab_size))
MemoryError
Traceback (most recent call last):
File "train2.py", line 538, in
nb_epoch=1, validation_data=valgen, nb_val_samples=nb_val_samples
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/models.py", line 656, in fit_generator
max_q_size=max_q_size)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1412, in fit_generator
max_q_size=max_q_size)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1474, in evaluate_generator
'or (x, y). Found: ' + str(generator_output))
Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
`
And as always, thanks for giving us the oportunity of using state of the art machine learning techniques into our own projects :)
The text was updated successfully, but these errors were encountered: