-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random 'Segmentation fault (core dumped)' error when training for long spancat #13026
Comments
A segmentation fault shouldn't be happening under any circumstances. Could you post the output of the following command? pip list Furthermore, I'd appreciate it if you could you try the following for me:
|
Thanks for the reply, Noticed one thing: Attached debug data output FYR. pip list output:
|
Thanks for the info - We'll investigate. |
To anyone facing this issue, I've used NER instead SpanCat and I had no issues. Regards. |
Hi, can you share the training/dev data and the custom code you were using to train the SpanCat model? We'd need that to reproduce the crash and debug the issue. |
Hi, Regards. |
That's understandable. The issue is likely a bug in the SpanCat component's code, but we still need to consistently reproduce the crash in order to identify the cause and fix it. If you run into this issue in the future where you can share the data that triggers the crash, please let us know. |
Hi,
I am getting 'Segmentation fault (core dumped)' when trying to train model for long SpanCat. I know this error could be related to OOM issues but this does not seem the case here. I tried to reduce [nlp] batch_size and [training.batcher.size] as shown in the attached config file and used a VM with very large RAM to make sure we are not running out of memory.
During training the VM memory usage never goes above 40% and even when reducing the [components.spancat.suggester] min_size and max_size the memory usage does not exceed 20% but the training exits with error 'Segmentation fault (core dumped)'.
Note: when training with low [components.spancat.suggester] values the training completes but with all zeroes for F, P and R.
His is the command I am using for training:
python -m spacy train config_spn.cfg --output ./output_v3_lg_1.3 --paths.train ./spacy_models_v3/train_data.spacy --paths.dev ./spacy_models_v3/test_data.spacy --code functions.py -V
This is the training output:
[2023-09-28 09:25:08,461] [DEBUG] Config overrides from CLI: ['paths.train', 'paths.dev']
ℹ Saving to output directory: output_v3_lg_1.3
ℹ Using CPU
=========================== Initializing pipeline ===========================
[2023-09-28 09:25:08,610] [INFO] Set up nlp object from config
[2023-09-28 09:25:08,618] [DEBUG] Loading corpus from path: spacy_models_v3/test_data.spacy
[2023-09-28 09:25:08,618] [DEBUG] Loading corpus from path: spacy_models_v3/train_data.spacy
[2023-09-28 09:25:08,619] [INFO] Pipeline: ['tok2vec', 'spancat']
[2023-09-28 09:25:08,621] [INFO] Created vocabulary
[2023-09-28 09:25:09,450] [INFO] Added vectors: en_core_web_lg
[2023-09-28 09:25:09,450] [INFO] Finished initializing nlp object
[2023-09-28 09:25:16,150] [INFO] Initialized pipeline components: ['tok2vec', 'spancat']
✔ Initialized pipeline
============================= Training pipeline =============================
[2023-09-28 09:25:16,158] [DEBUG] Loading corpus from path: spacy_models_v3/test_data.spacy
[2023-09-28 09:25:16,159] [DEBUG] Loading corpus from path: spacy_models_v3/train_data.spacy
ℹ Pipeline: ['tok2vec', 'spancat']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS SPANCAT SPANS_SC_F SPANS_SC_P SPANS_SC_R SCORE
0 0 98109.47 19535.08 0.00 0.00 4.58 0.00
0 200 528.73 781.51 0.00 0.00 3.75 0.00
Segmentation fault (core dumped)
Environment:
Operating System: Ubuntu 20.04.6 LTS
Python Version Used: 3.8.10
spaCy Version Used: 3.6.0
config_spn.cfg.txt
Thanks in advance!
The text was updated successfully, but these errors were encountered: