Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(pt): invalid type_map when multitask training (#4031)
It seems that in 3.0.0b3, executing multitask training or finetune task would run into a `RuntimeError`, calling inconsistent type map. The error log is shown below. However, the `type_map` in mutitask should be a shared dict. Diving into the source code, we would see a `type_map` [here](https://github.com/deepmodeling/deepmd-kit/blob/0e0fc1a63e478d3e56285b520b34a9c58488d659/deepmd/pt/entrypoints/main.py#L300). It would cause an empty type_map in multitask training because of no `type_map` found. After applying the modification in this PR, everything seems to be well. ``` Traceback (most recent call last): File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 562, in main train(FLAGS) File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 311, in train train_data = get_data( File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 802, in get_data data = DeepmdDataSystem( File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 184, in __init__ self.type_map = self._check_type_map_consistency(type_map_list) File "/public/home/ypliucat/.conda/envs/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/utils/data_system.py", line 616, in _check_type_map_consistency raise RuntimeError(f"inconsistent type map: {ret!s} {ii!s}") RuntimeError: inconsistent type map: ['Ag', 'Cu'] ['Ag', 'Ni'] ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced the training process to ensure consistent handling of model type configurations, improving clarity and availability based on multi-task settings. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Futaki Haduki <[email protected]>
- Loading branch information