We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在训练自己的模型时发生异常,辛苦大佬看一下如何解决。
环境情况:
Python 3.10.11 ddparser 1.0.8 LAC 2.1.2 paddlepaddle 2.4.2 paddlepaddle-gpu 2.4.2.post117
数据情况: train.txt和dev.txt都是从官方的test.txt中截取出来的,train.txt随意选了10条,dev.txt是8条,test.txt是2条。 (train.txt中保证至少有一个符号出现过2次)
启动命令:sh run_train.sh 启动前修改了run_train.sh,增加了 --punct 参数。
sh run_train.sh
运行结果:
(paddle_env) sh run_train.sh + python -u run.py --mode=train --use_cuda --feat=none --preprocess --model_files=model_files/baidu --train_data_path=data/baidu/train.txt --valid_data_path=data/baidu/dev.txt --test_data_path=data/baidu/test.txt --encoding_model=ernie-lstm --buckets=15 --punct /home/haipi/.conda/envs/paddle_env/lib/python3.10/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning) /home/haipi/.conda/envs/paddle_env/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) [2023-06-19 18:39:44,412] [ INFO] config.py:214 - Preprocess the data [2023-06-19 18:39:44,412] [ INFO] tokenizing_ernie.py:92 - get pretrain dir from https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz [2023-06-19 18:39:44,422] [ INFO] config.py:273 - dumping fileds to disk. [2023-06-19 18:39:44,436] [ INFO] run.py:480 - Override the default configs --------------------------+-------------------------- Param | Value --------------------------+-------------------------- n_embed | 300 embed_dropout | 0.33 n_mlp_arc | 500 n_mlp_rel | 100 mlp_dropout | 0.33 n_feat_embed | 60 n_char_embed | 50 n_lstm_feat_embed | 100 n_lstm_hidden | 300 n_tran_hidden | 300 n_lstm_layers | 3 lstm_dropout | 0.33 n_tran_feat_embed | 120 n_tran_feat_head | 12 n_tran_feat_layer | 2 n_tran_word_head | 12 n_tran_word_layer | 3 warmup_proportion | 0.1 weight_decay | 0.01 lstm_by_wp_embed_size | 200 lstm_lr | 0.002 ernie_lr | 5e-05 mu | 0.9 nu | 0.9 epsilon | 1e-12 decay | 0.75 decay_steps | 5000 epochs | 50000 patience | 30 min_freq | 2 fix_len | 20 clip | 1.0 mode | train config_path | config.ini model_files | model_files/baidu train_data_path | data/baidu/train.txt valid_data_path | data/baidu/dev.txt test_data_path | data/baidu/test.txt infer_data_path | None batch_size | 1000 log_path | ./log/log log_level | INFO infer_result_path | infer_result use_cuda | True preprocess | True use_data_parallel | False seed | 1 threads | 16 tree | False prob | False feat | none encoding_model | ernie-lstm buckets | 15 punct | True None | False nranks | 1 local_rank | 0 fields_path | model_files/baidu/fields model_path | model_files/baidu/model ernie_vocabs_size | 17964 n_words | 17964 n_feats | None n_rels | 12 pad_index | 0 unk_index | 17963 bos_index | 1 eos_index | 2 feat_pad_index | None --------------------------+-------------------------- [2023-06-19 18:39:44,437] [ INFO] run.py:481 - (word): ErnieField(pad=[PAD], unk=[UNK], bos=[CLS], eos=[SEP]) None (head): Field(bos=<bos>, eos=<eos>, use_vocab=False) (deprel): Field(bos=<bos>, eos=<eos>) [2023-06-19 18:39:44,437] [ INFO] run.py:482 - Set the max num of threads to 16 [2023-06-19 18:39:44,437] [ INFO] run.py:483 - Set the seed for generating random numbers to 1 [2023-06-19 18:39:44,437] [ INFO] run.py:484 - Run the subcommand in mode train [2023-06-19 18:39:44,437] [ INFO] run.py:71 - loading data. [2023-06-19 18:39:44,437] [ INFO] run.py:75 - init dataset. [2023-06-19 18:39:44,440] [ INFO] run.py:79 - set the data loaders. [2023-06-19 18:39:44,440] [ INFO] run.py:84 - train: 18 sentences, 7 batches, 7 buckets [2023-06-19 18:39:44,440] [ INFO] run.py:86 - dev: 7 sentences, 4 batches, 4 buckets [2023-06-19 18:39:44,440] [ INFO] run.py:88 - test: 1 sentences, 1 batches, 1 buckets [2023-06-19 18:39:44,440] [ INFO] run.py:91 - Create the model W0619 18:39:44.442440 248516 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.7 W0619 18:39:44.448432 248516 gpu_resources.cc:91] device: 0, cuDNN Version: 8.5. [2023-06-19 18:39:45,551] [ INFO] run.py:134 - start training. [2023-06-19 18:39:45,551] [ INFO] run.py:139 - Epoch 1 / 50000: /home/haipi/.conda/envs/paddle_env/lib/python3.10/site-packages/paddle/fluid/dygraph/math_op_patch.py:275: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.int32, the right dtype will convert to paddle.int64 warnings.warn( /home/haipi/.conda/envs/paddle_env/lib/python3.10/site-packages/paddle/fluid/framework.py:4002: DeprecationWarning: Op `cumsum` is executed through `append_op` under the dynamic mode, the corresponding API implementation needs to be upgraded to using `_C_ops` method. warnings.warn( /home/haipi/.conda/envs/paddle_env/lib/python3.10/site-packages/paddle/fluid/dygraph/math_op_patch.py:275: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.bool, the right dtype will convert to paddle.int64 warnings.warn( Could not load library libcudnn_adv_train.so.8. Error: /home/haipi/.conda/envs/paddle_env/bin/../lib/libcudnn_ops_train.so.8: symbol _ZN5cudnn3ops26JoinInternalPriorityStreamEP12cudnnContexti, version libcudnn_ops_infer.so.8 not defined in file libcudnn_ops_infer.so.8 with link time reference -------------------------------------- C++ Traceback (most recent call last): -------------------------------------- 0 rnn_dygraph_function(paddle::experimental::Tensor const&, std::vector<paddle::experimental::Tensor, std::allocator<paddle::experimental::Tensor> > const&, std::vector<paddle::experimental::Tensor, std::allocator<paddle::experimental::Tensor> > const&, paddle::experimental::Tensor const&, paddle::experimental::Tensor*, unsigned long, paddle::framework::AttributeMap const&) 1 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameTensorMap const&, paddle::imperative::NameTensorMap const&, paddle::framework::AttributeMap&, phi::Place const&, paddle::framework::AttributeMap*, bool, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&) 2 void paddle::imperative::Tracer::TraceOpImpl<egr::EagerVariable>(std::string const&, paddle::imperative::details::NameVarMapTrait<egr::EagerVariable>::Type const&, paddle::imperative::details::NameVarMapTrait<egr::EagerVariable>::Type const&, paddle::framework::AttributeMap&, phi::Place const&, bool, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&, paddle::framework::AttributeMap*, bool) 3 paddle::imperative::PreparedOp::Run(paddle::imperative::NameTensorMap const&, paddle::imperative::NameTensorMap const&, paddle::framework::AttributeMap const&, paddle::framework::AttributeMap const&) 4 phi::KernelImpl<void (*)(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor> const&, float, bool, int, int, int, std::string const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::DenseTensor*), &(void phi::RnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor> const&, float, bool, int, int, int, std::string const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::DenseTensor*))>::Compute(phi::KernelContext*) 5 void phi::KernelImpl<void (*)(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor> const&, float, bool, int, int, int, std::string const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::DenseTensor*), &(void phi::RnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor> const&, float, bool, int, int, int, std::string const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::DenseTensor*))>::KernelCallHelper<paddle::optional<phi::DenseTensor> const&, float, bool, int, int, int, std::string const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::DenseTensor*, phi::TypeTag<int> >::Compute<1, 3, 0, 0, phi::GPUContext const, phi::DenseTensor const, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> >, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > >(phi::KernelContext*, phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> >&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> >&) 6 void phi::RnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, std::vector<phi::DenseTensor const*, std::allocator<phi::DenseTensor const*> > const&, paddle::optional<phi::DenseTensor> const&, float, bool, int, int, int, std::string const&, int, bool, phi::DenseTensor*, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::DenseTensor*) 7 cudnnRNNForwardTrainingEx ---------------------- Error Message Summary: ---------------------- FatalError: `Process abort signal` is detected by the operating system. [TimeInfo: *** Aborted at 1687171186 (unix time) try "date -d @1687171186" if you are using GNU date ***] [SignalInfo: *** SIGABRT (@0x3ea0003cac4) received by PID 248516 (TID 0x7f6d72607740) from PID 248516 ***] run_train.sh: line 19: 248516 Aborted python -u run.py --mode=train --use_cuda --feat=none --preprocess --model_files=model_files/baidu --train_data_path=data/baidu/train.txt --valid_data_path=data/baidu/dev.txt --test_data_path=data/baidu/test.txt --encoding_model=ernie-lstm --buckets=15 --punct (paddle_env)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我在训练自己的模型时发生异常,辛苦大佬看一下如何解决。
环境情况:
数据情况:
train.txt和dev.txt都是从官方的test.txt中截取出来的,train.txt随意选了10条,dev.txt是8条,test.txt是2条。
(train.txt中保证至少有一个符号出现过2次)
启动命令:
sh run_train.sh
启动前修改了run_train.sh,增加了 --punct 参数。
运行结果:
The text was updated successfully, but these errors were encountered: