You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce the competition results based on the instructions in the README.
I download and unzip the files from the kaggle competition into the data/ folder
I run the command python make_features.py data/vars --add_days=63 which creates the following pickle files: 2017-08-15_2017-09-11.pkl, all.pkl, train_2.pkl and the directory vars/ in the data/ folder
I run the trainer python trainer.py --name s32 --hparam_set=s32 --n_models=3 --name s32 --no_eval --no_forward_split --asgd_decay=0.99 --max_steps=11500 --save_from_step=10500 and receive the following error:
UnknownError (see above for traceback): CUDNN_STATUS_EXECUTION_FAILED in tensorflow/stream_executor/cuda/cuda_dnn.cc(944): 'cudnnSetDropoutDescriptor( handle.get(), cudnn.handle(), dropout, state_memory.opaque(), state_memory.size(), seed)'
I am using a p3.2xlarge AWS instance with the Deep Learning AMI with Python 3.6.5 and Tensorflow-gpu==1.12.0
If I downgrade to TF-GPU 1.10, I still get the same error.
I am trying to reproduce the competition results based on the instructions in the README.
I download and unzip the files from the kaggle competition into the data/ folder
I run the command python make_features.py data/vars --add_days=63 which creates the following pickle files: 2017-08-15_2017-09-11.pkl, all.pkl, train_2.pkl and the directory vars/ in the data/ folder
I run the trainer python trainer.py --name s32 --hparam_set=s32 --n_models=3 --name s32 --no_eval --no_forward_split --asgd_decay=0.99 --max_steps=11500 --save_from_step=10500 and receive the following error:
UnknownError (see above for traceback): CUDNN_STATUS_EXECUTION_FAILED in tensorflow/stream_executor/cuda/cuda_dnn.cc(944): 'cudnnSetDropoutDescriptor( handle.get(), cudnn.handle(), dropout, state_memory.opaque(), state_memory.size(), seed)'
I am using a p3.2xlarge AWS instance with the Deep Learning AMI with Python 3.6.5 and Tensorflow-gpu==1.12.0
If I downgrade to TF-GPU 1.10, I still get the same error.
I am trying to reproduce the competition results based on the instructions in the README.
I download and unzip the files from the kaggle competition into the
data/
folderI run the command
python make_features.py data/vars --add_days=63
which creates the following pickle files:2017-08-15_2017-09-11.pkl
,all.pkl
,train_2.pkl
and the directoryvars/
in thedata/
folderI run the trainer
python trainer.py --name s32 --hparam_set=s32 --n_models=3 --name s32 --no_eval --no_forward_split --asgd_decay=0.99 --max_steps=11500 --save_from_step=10500
and receive the following error:UnknownError (see above for traceback): CUDNN_STATUS_EXECUTION_FAILED in tensorflow/stream_executor/cuda/cuda_dnn.cc(944): 'cudnnSetDropoutDescriptor( handle.get(), cudnn.handle(), dropout, state_memory.opaque(), state_memory.size(), seed)'
I am using a p3.2xlarge AWS instance with the Deep Learning AMI with Python 3.6.5 and Tensorflow-gpu==1.12.0
If I downgrade to TF-GPU 1.10, I still get the same error.
How can I resolve this?
Full output from train command
The text was updated successfully, but these errors were encountered: