Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to train "ixa-ehu/ixambert-base-cased" model #48

Open
jmurua14 opened this issue Aug 26, 2022 · 1 comment
Open

Trying to train "ixa-ehu/ixambert-base-cased" model #48

jmurua14 opened this issue Aug 26, 2022 · 1 comment

Comments

@jmurua14
Copy link

Hi!

You have done a great job!! I have been training two different models. the one mentioned in the title ("ixa-ehu/ixambert-base-cased") and multibert_cased. With the multibert I didn't have any problems with the training, however, when I try to train the other model it says that I have a missmatch with the shape of the vocabulary size.

In the config file of the "ixa-ehu/ixambert-base-cased" model the vocabulary size is the following one:
08/18/2022 09:41:28 - INFO - awesome_align.configuration_utils - Model config BertConfig {
"architectures": null,
"attention_probs_dropout_prob": 0.1,
"bos_token_id": null,
"do_sample": false,
"eos_token_ids": null,
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_beams": 1,
"num_hidden_layers": 12,
"num_labels": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": null,
"repetition_penalty": 1.0,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 119099
}

When I begin with the training i get this error:
Iteration: 0%| | 0/40000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/mnt/datuak/virtualenvs/transformers/bin/awesome-train", line 8, in
sys.exit(main())
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/awesome_align/run_train.py", line 848, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/awesome_align/run_train.py", line 370, in train
loss = model(inputs_src=inputs_src, labels_src=labels_src)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/awesome_align/modeling.py", line 660, in forward
masked_lm_loss = loss_fct(prediction_scores_src.view(-1, self.config.vocab_size), labels_src.view(-1))
RuntimeError: shape '[-1, 119101]' is invalid for input of size 5716752

As you can see the vocab_size has increased by 2 from 119099 to 119101. This is due to the CLS and SEP tokens, however, I don't know why I get this error. I have tried to manually decrease the vocab_size in the code, but this leads to some other errors when I make the alignments.

I leave you here the awesome-train command I have used for training:
CUDA_VISIBLE_DEVICES=1 awesome-train
--output_dir=$OUTPUT_DIR
--model_name_or_path=ixa-ehu/ixambert-base-cased
--extraction 'softmax'
--do_train
--train_mlm
--train_tlm
--train_tlm_full
--train_so
--train_psi
--train_co
--train_data_file=$TRAIN_FILE
--per_gpu_train_batch_size 2
--gradient_accumulation_steps 4
--num_train_epochs 1
--learning_rate 2e-5
--save_steps 10000
--max_steps 40000 \

Could you please help me solve this issue?

Thanks!

@zdou0830
Copy link
Collaborator

Hi, right now the repo only supports mBERT and XLM-R. You can check this commit to see how to incorporate a new model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants