using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half. #3197

PMPBinZhang · 2024-10-25T04:03:53Z

System Info

- `Accelerate` version: 1.0.0
- Platform: Linux-6.8.0-47-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /home/user/anaconda3/envs/accelerate_multi/bin/accelerate
- Python version: 3.11.0
- Numpy version: 1.23.5
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 62.55 GB
- GPU type: NVIDIA GeForce RTX 4090
- `Accelerate` default config:

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

the accelerate config as follows:
compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
deepspeed_config_file: /home/user/work/screenplays_sft/ds_zero3_cpu_offload.config
zero3_init_flag: true
deepspeed_multinode_launcher: standard
main_process_ip: 192.168.252.20
main_process_port: 25253
distributed_type: DEEPSPEED
downcast_bf16: true
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
num_machines: 2
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
ds_zero3_cpu_offload.config as follows:
{
"bf16": {
"enabled": "auto"
},
"zero_optimization": {
"stage": 3,
"stage3_gather_16bit_weights_on_model_save": false,
"allgather_bucket_size": 5e8,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"zero_quantized_weights": true,
"zero_quantized_gradients": true,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
}
},
"gradient_clipping": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"comms_logger": {
"enabled": true,
"verbose": true,
"prof_all": true,
"debug": false
}
}
scripts command as follows:
accelerate launch --config_file multi_nodes_single_gpu_deepspeed_zero3_cfg_file.yaml sft_trainer.py --log_level info --bf16 True
4bit load code as follows:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_storage=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(base_model_path,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16)
5. and if I configure deepspeed in accelerate config, then I can use bf16, the config as follows:
compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
deepspeed_multinode_launcher: standard
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: false
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_process_ip: 192.168.252.20
main_process_port: 25253
main_training_function: main
num_machines: 2
num_processes: 2
mixed_precision: bf16
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: true
6. If I add mixed_precision: bf16 to 1. config file like this
compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
deepspeed_config_file: /home/user/work/screenplays_sft/ds_zero3_cpu_offload.config
zero3_init_flag: true
deepspeed_multinode_launcher: standard
main_process_ip: 192.168.252.20
main_process_port: 25253
distributed_type: DEEPSPEED
downcast_bf16: true
mixed_precision: bf16
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
num_machines: 2
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
the I got error "ValueError: When using deepspeed_config_file, the following accelerate config variables will be ignored: ['gradient_accumulation_steps', 'gradient_clipping', 'zero_stage', 'offload_optimizer_device', 'offload_param_device', 'offload_param_nvme_path', 'offload_optimizer_nvme_path', 'zero3_save_16bit_model', 'mixed_precision']."

could you please tell me the reason, thank you very much.

Expected behavior

using bf16 to train when using deepspeed original json config file

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-24T15:07:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half. #3197

using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half. #3197

PMPBinZhang commented Oct 25, 2024

github-actions bot commented Nov 24, 2024

using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half. #3197

using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half. #3197

Comments

PMPBinZhang commented Oct 25, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

github-actions bot commented Nov 24, 2024