You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/content/transformers/src/transformers/models/t5/tokenization_t5_fast.py:160: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with truncation is True.
Be aware that you SHOULD NOT rely on t5-11b automatically truncating your input to 512 when padding/encoding.
If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with model_max_length or pass max_length when encoding/padding.
To avoid this warning, please instantiate this tokenizer with model_max_length set to your preferred value.
warnings.warn(
[INFO|modeling_utils.py:3259] 2024-02-12 10:24:01,230 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--t5-11b/snapshots/90f37703b3334dfe9d2b009bfcbfbf1ac9d28ea3/pytorch_model.bin
/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
[INFO|modeling_utils.py:3365] 2024-02-12 10:27:15,114 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|configuration_utils.py:840] 2024-02-12 10:27:15,119 >> Generate config GenerationConfig {
"decoder_start_token_id": 0,
"eos_token_id": 1,
"pad_token_id": 0
}
[INFO|modeling_utils.py:3992] 2024-02-12 10:28:22,087 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration.
[INFO|modeling_utils.py:4000] 2024-02-12 10:28:22,087 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at t5-11b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3546] 2024-02-12 10:28:22,509 >> Generation config file not found, using a generation config created from the model config.
[INFO|modeling_utils.py:1875] 2024-02-12 10:28:25,134 >> You are resizing the embedding layer without providing a pad_to_multiple_of parameter. This means that the new embedding dimension will be 32100. This might induce some performance reduction as Tensor Cores will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
Loading cached processed dataset at /root/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/27ea1f6483dca29955adc6a9e7d8a3556fbb1aea/cache-c04e84c93b455333.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/27ea1f6483dca29955adc6a9e7d8a3556fbb1aea/cache-4e2ffa6ebb051a0a.arrow
2024-02-12 10:28:29.087022: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-12 10:28:29.087145: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-12 10:28:29.205338: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-12 10:28:31.618944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|trainer.py:586] 2024-02-12 10:28:32,744 >> Using auto half precision backend
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
[INFO|configuration_utils.py:792] 2024-02-12 10:24:01,102 >> Model config T5Config {
"_name_or_path": "t5-11b",
"architectures": [
"T5WithLMHeadModel"
],
"classifier_dropout": 0.0,
"d_ff": 65536,
"d_kv": 128,
"d_model": 1024,
"decoder_start_token_id": 0,
"dense_act_fn": "relu",
"dropout_rate": 0.1,
"eos_token_id": 1,
"feed_forward_proj": "relu",
"initializer_factor": 1.0,
"is_encoder_decoder": true,
"is_gated_act": false,
"layer_norm_epsilon": 1e-06,
"model_type": "t5",
"n_positions": 512,
"num_decoder_layers": 24,
"num_heads": 128,
"num_layers": 24,
"output_past": true,
"pad_token_id": 0,
"relative_attention_max_distance": 128,
"relative_attention_num_buckets": 32,
"task_specific_params": {
"summarization": {
"early_stopping": true,
"length_penalty": 2.0,
"max_length": 200,
"min_length": 30,
"no_repeat_ngram_size": 3,
"num_beams": 4,
"prefix": "summarize: "
},
"translation_en_to_de": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to German: "
},
"translation_en_to_fr": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to French: "
},
"translation_en_to_ro": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to Romanian: "
}
},
"transformers_version": "4.38.0.dev0",
"use_cache": true,
"vocab_size": 32128
}
/content/transformers/src/transformers/models/t5/tokenization_t5_fast.py:160: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with
truncation is True
.model_max_length
or passmax_length
when encoding/padding.model_max_length
set to your preferred value.warnings.warn(
[INFO|modeling_utils.py:3259] 2024-02-12 10:24:01,230 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--t5-11b/snapshots/90f37703b3334dfe9d2b009bfcbfbf1ac9d28ea3/pytorch_model.bin
/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
[INFO|modeling_utils.py:3365] 2024-02-12 10:27:15,114 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|configuration_utils.py:840] 2024-02-12 10:27:15,119 >> Generate config GenerationConfig {
"decoder_start_token_id": 0,
"eos_token_id": 1,
"pad_token_id": 0
}
[INFO|modeling_utils.py:3992] 2024-02-12 10:28:22,087 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration.
[INFO|modeling_utils.py:4000] 2024-02-12 10:28:22,087 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at t5-11b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3546] 2024-02-12 10:28:22,509 >> Generation config file not found, using a generation config created from the model config.
[INFO|modeling_utils.py:1875] 2024-02-12 10:28:25,134 >> You are resizing the embedding layer without providing a
pad_to_multiple_of
parameter. This means that the new embedding dimension will be 32100. This might induce some performance reduction as Tensor Cores will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tcLoading cached processed dataset at /root/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/27ea1f6483dca29955adc6a9e7d8a3556fbb1aea/cache-c04e84c93b455333.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/wmt16/ro-en/1.0.0/27ea1f6483dca29955adc6a9e7d8a3556fbb1aea/cache-4e2ffa6ebb051a0a.arrow
2024-02-12 10:28:29.087022: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-12 10:28:29.087145: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-12 10:28:29.205338: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-12 10:28:31.618944: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|trainer.py:586] 2024-02-12 10:28:32,744 >> Using auto half precision backend
CalledProcessError Traceback (most recent call last)
in <cell line: 1>()
----> 1 get_ipython().run_cell_magic('bash', '', '\ncd transformers; export BS=1; rm -rf output_dir; \\nPYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0 deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \\n--deepspeed /content/drive/MyDrive/data/deepspeed/config/ds_config_zero3_tuned.json \\n--model_name_or_path t5-11b \\n--output_dir /content/drive/MyDrive/models/t5-11b/ --adam_eps 1e-06 --evaluation_strategy=steps \\n--do_train --do_eval --label_smoothing 0.1 --learning_rate 3e-5 \\n--max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir \\n--per_device_train_batch_size $BS --per_device_eval_batch_size $BS --predict_with_generate --sortish_sampler \\n--val_max_target_length 128 --warmup_steps 5 --max_train_samples 500 --max_eval_samples 50 \\n--dataset_name wmt16 --dataset_config ro-en --source_lang en --target_lang ro \\n--source_lang en --target_lang ro \\n--gradient_accumulation_steps 8 \\n--fp16\n\n\n')
4 frames
/usr/local/lib/python3.10/dist-packages/google/colab/_shell.py in run_cell_magic(self, magic_name, line, cell)
332 if line and not cell:
333 cell = ' '
--> 334 return super().run_cell_magic(magic_name, line, cell)
335
336
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2471 with self.builtin_trap:
2472 args = (magic_arg_s, cell)
-> 2473 result = fn(*args, **kwargs)
2474 return result
2475
/usr/local/lib/python3.10/dist-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
140 else:
141 line = script
--> 142 return self.shebang(line, cell)
143
144 # write a basic docstring:
in shebang(self, line, cell)
/usr/local/lib/python3.10/dist-packages/IPython/core/magic.py in (f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/local/lib/python3.10/dist-packages/IPython/core/magics/script.py in shebang(self, line, cell)
243 sys.stderr.flush()
244 if args.raise_error and p.returncode!=0:
--> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
246
247 def _run_script(self, p, cell, to_close):
CalledProcessError: Command 'b'\ncd transformers; export BS=1; rm -rf output_dir; \\nPYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0 deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \\n--deepspeed /content/drive/MyDrive/data/deepspeed/config/ds_config_zero3_tuned.json \\n--model_name_or_path t5-11b \\n--output_dir /content/drive/MyDrive/models/t5-11b/ --adam_eps 1e-06 --evaluation_strategy=steps \\n--do_train --do_eval --label_smoothing 0.1 --learning_rate 3e-5 \\n--max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir \\n--per_device_train_batch_size $BS --per_device_eval_batch_size $BS --predict_with_generate --sortish_sampler \\n--val_max_target_length 128 --warmup_steps 5 --max_train_samples 500 --max_eval_samples 50 \\n--dataset_name wmt16 --dataset_config ro-en --source_lang en --target_lang ro \\n--source_lang en --target_lang ro \\n--gradient_accumulation_steps 8 \\n--fp16\n\n\n'' returned non-zero exit status 247.
Beta Was this translation helpful? Give feedback.
All reactions