Refactor property (#37)

* change property.npy to any name * Init branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change | to Union * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change sub_var_name default to [] * Solve pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * solve scanning github * fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete useless file * Solve some UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve precommit * slove pre * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve dptest UT, dpatomicmodel UT, code scannisang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete param and * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve UT fail caused by task_dim and property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix permutation error * Add property bias UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover rcond doc * recover blank * Change code according according to coderabbitai * solve pre-commit * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change apply_bias doc * update the version compatibility * feat (tf/pt): add atomic weights to tensor loss (deepmodeling#4466) Interfaces are of particular interest in many studies. However, the configurations in the training set to represent the interface normally also include large parts of the bulk material. As a result, the final model would prefer the bulk information while the interfacial information is less learnt. It is difficult to simply improve the proportion of interfaces in the configurations since the electronic structures of the interface might only be reasonable with a certain thickness of bulk materials. Therefore, I wonder whether it is possible to define weights for atomic quantities in loss functions. This allows us to add higher weights for the atomic information for the regions of interest and probably makes the model "more focused" on the region of interest. In this PR, I add the keyword `enable_atomic_weight` to the loss function of the tensor model. In principle, it could be generalised to any atomic quantity, e.g., atomic forces. I would like to know the developers' comments/suggestions about this feature. I can add support for other loss functions and finish unit tests once we agree on this feature. Best.  ## Summary by CodeRabbit - **New Features** - Introduced an optional parameter for atomic weights in loss calculations, enhancing flexibility in the `TensorLoss` class. - Added a suite of unit tests for the `TensorLoss` functionality, ensuring consistency between TensorFlow and PyTorch implementations. - **Bug Fixes** - Updated logic for local loss calculations to ensure correct application of atomic weights based on user input. - **Documentation** - Improved clarity of documentation for several function arguments, including the addition of a new argument related to atomic weights.  * delete sub_var_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * recover to property key * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix conflict * Fix UT * Add document of property fitting * Delete checkpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add get_property_name to DeepEvalBackend * pd: fix learning rate setting when resume (deepmodeling#4480) "When resuming training, there is no need to add `self.start_step` to the step count because Paddle uses `lr_sche.last_epoch` as the input for `step`, which already records the `start_step` steps." learning rate are correct after fixing ![22AD6874B74E437E9B133D75ABCC02FE](https://github.com/user-attachments/assets/1ad0ce71-6e1c-4de5-87dc-0daca1f6f038)  ## Summary by CodeRabbit - **New Features** - Enhanced training process with improved optimizer configuration and learning rate adjustments. - Refined logging of training and validation results for clarity. - Improved model saving logic to preserve the latest state during interruptions. - Enhanced tensorboard logging for detailed tracking of training metrics. - **Bug Fixes** - Corrected lambda function for learning rate scheduler to reference warmup steps accurately. - **Chores** - Streamlined data loading and handling for efficient training across different tasks.  * docs: update deepmd-gnn URL (deepmodeling#4482)  ## Summary by CodeRabbit - **Documentation** - Updated guidelines for creating and integrating new models in the DeePMD-kit framework. - Added new sections on descriptors, fitting networks, and model requirements. - Enhanced unit testing section with instructions for regression tests. - Updated URL for the DeePMD-GNN plugin to reflect new repository location.  Signed-off-by: Jinzhe Zeng <[email protected]> * docs: update DPA-2 citation (deepmodeling#4483)  ## Summary by CodeRabbit - **New Features** - Updated references in the bibliography for the DPA-2 model to include a new article entry for 2024. - Added a new reference for an attention-based descriptor. - **Bug Fixes** - Corrected reference links in documentation to point to updated DOI links instead of arXiv. - **Documentation** - Revised entries in the credits and model documentation to reflect the latest citations and details. - Enhanced clarity and detail in fine-tuning documentation for TensorFlow and PyTorch implementations.  --------- Signed-off-by: Jinzhe Zeng <[email protected]> * docs: fix a minor typo on the title of `install-from-c-library.md` (deepmodeling#4484)  ## Summary by CodeRabbit - **Documentation** - Updated formatting of the installation guide for the pre-compiled C library. - Icons for TensorFlow and JAX are now displayed together in the header. - Retained all installation instructions and compatibility notes.  Signed-off-by: Jinzhe Zeng <[email protected]> * fix: print dlerror if dlopen fails (deepmodeling#4485) xref: deepmodeling/deepmd-gnn#44  ## Summary by CodeRabbit - **New Features** - Enhanced error messages for library loading failures on non-Windows platforms. - Updated thread management environment variable checks for improved compatibility. - Added support for mixed types in tensor input handling, allowing for more flexible configurations. - **Bug Fixes** - Improved error reporting for dynamic library loading issues.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * change doc to py * Add out_bias out_std doc * change bias method to compute_stats_do_not_distinguish_types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change var_name to property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change logic of extensive bias * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add doc for neww added parameter * change doc for compute_stats_do_not_distinguish_types * try to fix dptest * change all property to property_name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete key 'property' completely * Fix UT * Fix dptest UT * pd: fix oom error (deepmodeling#4493) Paddle use `MemoryError` rather than `RuntimeError` used in pytorch, now I can test DPA-1 and DPA-2 in 16G V100... ![image](https://github.com/user-attachments/assets/42ead773-bf26-4195-8f67-404b151371de)  ## Summary by CodeRabbit - **Bug Fixes** - Improved detection of out-of-memory (OOM) errors to enhance application stability. - Ensured cached memory is cleared upon OOM errors, preventing potential memory leaks.  * pd: add missing `dp.eval()` in pd backend (deepmodeling#4488) Switch to eval mode when evaluating model, otherwise `self.training` will be `True`, backward graph will be created and cause OOM  ## Summary by CodeRabbit - **New Features** - Enhanced model evaluation state management to ensure correct behavior during evaluation. - **Bug Fixes** - Improved type consistency in the `normalize_coord` function for better computational accuracy.  * [pre-commit.ci] pre-commit autoupdate (deepmodeling#4497)  updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.4](astral-sh/ruff-pre-commit@v0.8.3...v0.8.4)  Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Delete attribute * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Solve error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete property_name in serialize --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chenqqian Zhang <[email protected]> Co-authored-by: Jia-Xin Zhu <[email protected]> Co-authored-by: HydrogenSulfate <[email protected]> Co-authored-by: Jinzhe Zeng <[email protected]>
iProzd · Dec 24, 2024 · dc1b1a3 · dc1b1a3
1 parent 76f28e9
commit dc1b1a3
Show file tree

Hide file tree

Showing 63 changed files with 1,225 additions and 186 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -29,7 +29,7 @@ repos:
         exclude: ^source/3rdparty
   - repo: https://github.com/astral-sh/ruff-pre-commit
     # Ruff version.
-    rev: v0.8.3
+    rev: v0.8.4
     hooks:
       - id: ruff
         args: ["--fix"]

diff --git a/CITATIONS.bib b/CITATIONS.bib
@@ -128,26 +128,26 @@ @article{Zhang_NpjComputMater_2024_v10_p94
   doi          = {10.1038/s41524-024-01278-7},
 }
 
-@misc{Zhang_2023_DPA2,
+@article{Zhang_npjComputMater_2024_v10_p293,
   annote       = {DPA-2},
   author       = {
     Duo Zhang and Xinzijian Liu and Xiangyu Zhang and Chengqian Zhang and Chun
-    Cai and Hangrui Bi and Yiming Du and Xuejian Qin and Jiameng Huang and
-    Bowen Li and Yifan Shan and Jinzhe Zeng and Yuzhi Zhang and Siyuan Liu and
-    Yifan Li and Junhan Chang and Xinyan Wang and Shuo Zhou and Jianchuan Liu
-    and Xiaoshan Luo and Zhenyu Wang and Wanrun Jiang and Jing Wu and Yudi Yang
-    and Jiyuan Yang and Manyi Yang and Fu-Qiang Gong and Linshuang Zhang and
-    Mengchao Shi and Fu-Zhi Dai and Darrin M. York and Shi Liu and Tong Zhu and
-    Zhicheng Zhong and Jian Lv and Jun Cheng and Weile Jia and Mohan Chen and
-    Guolin Ke and Weinan E and Linfeng Zhang and Han Wang
+    Cai and Hangrui Bi and Yiming Du and Xuejian Qin and Anyang Peng and
+    Jiameng Huang and Bowen Li and Yifan Shan and Jinzhe Zeng and Yuzhi Zhang
+    and Siyuan Liu and Yifan Li and Junhan Chang and Xinyan Wang and Shuo Zhou
+    and Jianchuan Liu and Xiaoshan Luo and Zhenyu Wang and Wanrun Jiang and
+    Jing Wu and Yudi Yang and Jiyuan Yang and Manyi Yang and Fu-Qiang Gong and
+    Linshuang Zhang and Mengchao Shi and Fu-Zhi Dai and Darrin M. York and Shi
+    Liu and Tong Zhu and Zhicheng Zhong and Jian Lv and Jun Cheng and Weile Jia
+    and Mohan Chen and Guolin Ke and Weinan E and Linfeng Zhang and Han Wang
   },
-  title        = {
-    {DPA-2: Towards a universal large atomic model for molecular and material
-    simulation}
-  },
-  publisher    = {arXiv},
-  year         = 2023,
-  doi          = {10.48550/arXiv.2312.15492},
+  title        = {{DPA-2: a large atomic model as a multi-task learner}},
+  journal      = {npj Comput. Mater},
+  year         = 2024,
+  volume       = 10,
+  number       = 1,
+  pages        = 293,
+  doi          = {10.1038/s41524-024-01493-2},
 }
 
 @article{Zhang_PhysPlasmas_2020_v27_p122704,

diff --git a/deepmd/dpmodel/atomic_model/__init__.py b/deepmd/dpmodel/atomic_model/__init__.py
@@ -42,6 +42,9 @@
 from .polar_atomic_model import (
     DPPolarAtomicModel,
 )
+from .property_atomic_model import (
+    DPPropertyAtomicModel,
+)
 
 __all__ = [
     "BaseAtomicModel",
@@ -50,6 +53,7 @@
     "DPDipoleAtomicModel",
     "DPEnergyAtomicModel",
     "DPPolarAtomicModel",
+    "DPPropertyAtomicModel",
     "DPZBLLinearEnergyAtomicModel",
     "LinearEnergyAtomicModel",
     "PairTabAtomicModel",

diff --git a/deepmd/dpmodel/atomic_model/property_atomic_model.py b/deepmd/dpmodel/atomic_model/property_atomic_model.py
@@ -1,4 +1,6 @@
 # SPDX-License-Identifier: LGPL-3.0-or-later
+import numpy as np
+
 from deepmd.dpmodel.fitting.property_fitting import (
     PropertyFittingNet,
 )
@@ -15,3 +17,25 @@ def __init__(self, descriptor, fitting, type_map, **kwargs):
                 "fitting must be an instance of PropertyFittingNet for DPPropertyAtomicModel"
             )
         super().__init__(descriptor, fitting, type_map, **kwargs)
+
+    def apply_out_stat(
+        self,
+        ret: dict[str, np.ndarray],
+        atype: np.ndarray,
+    ):
+        """Apply the stat to each atomic output.
+
+        In property fitting, each output will be multiplied by label std and then plus the label average value.
+
+        Parameters
+        ----------
+        ret
+            The returned dict by the forward_atomic method
+        atype
+            The atom types. nf x nloc. It is useless in property fitting.
+
+        """
+        out_bias, out_std = self._fetch_out_stat(self.bias_keys)
+        for kk in self.bias_keys:
+            ret[kk] = ret[kk] * out_std[kk][0] + out_bias[kk][0]
+        return ret
diff --git a/deepmd/dpmodel/descriptor/dpa2.py b/deepmd/dpmodel/descriptor/dpa2.py
@@ -387,7 +387,7 @@ def __init__(
         use_tebd_bias: bool = False,
         type_map: Optional[list[str]] = None,
     ) -> None:
-        r"""The DPA-2 descriptor. see https://arxiv.org/abs/2312.15492.
+        r"""The DPA-2 descriptor[1]_.
 
         Parameters
         ----------
@@ -434,6 +434,11 @@ def __init__(
         sw:                 torch.Tensor
             The switch function for decaying inverse distance.
 
+        References
+        ----------
+        .. [1] Zhang, D., Liu, X., Zhang, X. et al. DPA-2: a
+           large atomic model as a multi-task learner. npj
+           Comput Mater 10, 293 (2024). https://doi.org/10.1038/s41524-024-01493-2
         """
 
         def init_subclass_params(sub_data, sub_class):

diff --git a/deepmd/dpmodel/fitting/property_fitting.py b/deepmd/dpmodel/fitting/property_fitting.py
@@ -41,10 +41,9 @@ class PropertyFittingNet(InvarFitting):
             this list is of length :math:`N_l + 1`, specifying if the hidden layers and the output layer are trainable.
     intensive
             Whether the fitting property is intensive.
-    bias_method
-            The method of applying the bias to each atomic output, user can select 'normal' or 'no_bias'.
-            If 'normal' is used, the computed bias will be added to the atomic output.
-            If 'no_bias' is used, no bias will be added to the atomic output.
+    property_name:
+            The name of fitting property, which should be consistent with the property name in the dataset.
+            If the data file is named `humo.npy`, this parameter should be "humo".
     resnet_dt
             Time-step `dt` in the resnet construction:
             :math:`y = x + dt * \phi (Wx + b)`
@@ -74,7 +73,7 @@ def __init__(
         rcond: Optional[float] = None,
         trainable: Union[bool, list[bool]] = True,
         intensive: bool = False,
-        bias_method: str = "normal",
+        property_name: str = "property",
         resnet_dt: bool = True,
         numb_fparam: int = 0,
         numb_aparam: int = 0,
@@ -89,9 +88,8 @@ def __init__(
     ) -> None:
         self.task_dim = task_dim
         self.intensive = intensive
-        self.bias_method = bias_method
         super().__init__(
-            var_name="property",
+            var_name=property_name,
             ntypes=ntypes,
             dim_descrpt=dim_descrpt,
             dim_out=task_dim,
@@ -113,9 +111,9 @@ def __init__(
     @classmethod
     def deserialize(cls, data: dict) -> "PropertyFittingNet":
         data = data.copy()
-        check_version_compatibility(data.pop("@version"), 3, 1)
+        check_version_compatibility(data.pop("@version"), 4, 1)
         data.pop("dim_out")
-        data.pop("var_name")
+        data["property_name"] = data.pop("var_name")
         data.pop("tot_ener_zero")
         data.pop("layer_name")
         data.pop("use_aparam_as_mask", None)
@@ -131,6 +129,8 @@ def serialize(self) -> dict:
             **InvarFitting.serialize(self),
             "type": "property",
             "task_dim": self.task_dim,
+            "intensive": self.intensive,
         }
+        dd["@version"] = 4
 
         return dd
diff --git a/deepmd/dpmodel/model/property_model.py b/deepmd/dpmodel/model/property_model.py
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: LGPL-3.0-or-later
-from deepmd.dpmodel.atomic_model.dp_atomic_model import (
-    DPAtomicModel,
+from deepmd.dpmodel.atomic_model import (
+    DPPropertyAtomicModel,
 )
 from deepmd.dpmodel.model.base_model import (
     BaseModel,
@@ -13,7 +13,7 @@
     make_model,
 )
 
-DPPropertyModel_ = make_model(DPAtomicModel)
+DPPropertyModel_ = make_model(DPPropertyAtomicModel)
 
 
 @BaseModel.register("property")

diff --git a/deepmd/entrypoints/test.py b/deepmd/entrypoints/test.py
@@ -779,9 +779,17 @@ def test_property(
     tuple[list[np.ndarray], list[int]]
         arrays with results and their shapes
     """
-    data.add("property", dp.task_dim, atomic=False, must=True, high_prec=True)
+    var_name = dp.get_var_name()
+    assert isinstance(var_name, str)
+    data.add(var_name, dp.task_dim, atomic=False, must=True, high_prec=True)
     if has_atom_property:
-        data.add("atom_property", dp.task_dim, atomic=True, must=False, high_prec=True)
+        data.add(
+            f"atom_{var_name}",
+            dp.task_dim,
+            atomic=True,
+            must=False,
+            high_prec=True,
+        )
 
     if dp.get_dim_fparam() > 0:
         data.add(
@@ -832,12 +840,12 @@ def test_property(
         aproperty = ret[1]
         aproperty = aproperty.reshape([numb_test, natoms * dp.task_dim])
 
-    diff_property = property - test_data["property"][:numb_test]
+    diff_property = property - test_data[var_name][:numb_test]
     mae_property = mae(diff_property)
     rmse_property = rmse(diff_property)
 
     if has_atom_property:
-        diff_aproperty = aproperty - test_data["atom_property"][:numb_test]
+        diff_aproperty = aproperty - test_data[f"atom_{var_name}"][:numb_test]
         mae_aproperty = mae(diff_aproperty)
         rmse_aproperty = rmse(diff_aproperty)
 
@@ -854,7 +862,7 @@ def test_property(
         detail_path = Path(detail_file)
 
         for ii in range(numb_test):
-            test_out = test_data["property"][ii].reshape(-1, 1)
+            test_out = test_data[var_name][ii].reshape(-1, 1)
             pred_out = property[ii].reshape(-1, 1)
 
             frame_output = np.hstack((test_out, pred_out))
@@ -868,7 +876,7 @@ def test_property(
 
         if has_atom_property:
             for ii in range(numb_test):
-                test_out = test_data["atom_property"][ii].reshape(-1, 1)
+                test_out = test_data[f"atom_{var_name}"][ii].reshape(-1, 1)
                 pred_out = aproperty[ii].reshape(-1, 1)
 
                 frame_output = np.hstack((test_out, pred_out))

diff --git a/deepmd/infer/deep_eval.py b/deepmd/infer/deep_eval.py
@@ -70,8 +70,6 @@ class DeepEvalBackend(ABC):
         "dipole_derv_c_redu": "virial",
         "dos": "atom_dos",
         "dos_redu": "dos",
-        "property": "atom_property",
-        "property_redu": "property",
         "mask_mag": "mask_mag",
         "mask": "mask",
         # old models in v1
@@ -276,6 +274,10 @@ def get_has_spin(self) -> bool:
         """Check if the model has spin atom types."""
         return False
 
+    def get_var_name(self) -> str:
+        """Get the name of the fitting property."""
+        raise NotImplementedError
+
     @abstractmethod
     def get_ntypes_spin(self) -> int:
         """Get the number of spin atom types of this model. Only used in old implement."""

diff --git a/deepmd/infer/deep_property.py b/deepmd/infer/deep_property.py
@@ -37,25 +37,41 @@ class DeepProperty(DeepEval):
         Keyword arguments.
     """
 
-    @property
     def output_def(self) -> ModelOutputDef:
-        """Get the output definition of this model."""
-        return ModelOutputDef(
+        """
+        Get the output definition of this model.
+        But in property_fitting, the output definition is not known until the model is loaded.
+        So we need to rewrite the output definition after the model is loaded.
+        See detail in change_output_def.
+        """
+        pass
+
+    def change_output_def(self) -> None:
+        """
+        Change the output definition of this model.
+        In property_fitting, the output definition is known after the model is loaded.
+        We need to rewrite the output definition and related information.
+        """
+        self.output_def = ModelOutputDef(
             FittingOutputDef(
                 [
                     OutputVariableDef(
-                        "property",
-                        shape=[-1],
+                        self.get_var_name(),
+                        shape=[self.get_task_dim()],
                         reducible=True,
                         atomic=True,
+                        intensive=self.get_intensive(),
                     ),
                 ]
             )
         )
-
-    def change_output_def(self) -> None:
-        self.output_def["property"].shape = self.task_dim
-        self.output_def["property"].intensive = self.get_intensive()
+        self.deep_eval.output_def = self.output_def
+        self.deep_eval._OUTDEF_DP2BACKEND[self.get_var_name()] = (
+            f"atom_{self.get_var_name()}"
+        )
+        self.deep_eval._OUTDEF_DP2BACKEND[f"{self.get_var_name()}_redu"] = (
+            self.get_var_name()
+        )
 
     @property
     def task_dim(self) -> int:
@@ -120,10 +136,12 @@ def eval(
             aparam=aparam,
             **kwargs,
         )
-        atomic_property = results["property"].reshape(
+        atomic_property = results[self.get_var_name()].reshape(
             nframes, natoms, self.get_task_dim()
         )
-        property = results["property_redu"].reshape(nframes, self.get_task_dim())
+        property = results[f"{self.get_var_name()}_redu"].reshape(
+            nframes, self.get_task_dim()
+        )
 
         if atomic:
             return (
@@ -141,5 +159,9 @@ def get_intensive(self) -> bool:
         """Get whether the property is intensive."""
         return self.deep_eval.get_intensive()
 
+    def get_var_name(self) -> str:
+        """Get the name of the fitting property."""
+        return self.deep_eval.get_var_name()
+
 
 __all__ = ["DeepProperty"]
diff --git a/deepmd/pd/infer/deep_eval.py b/deepmd/pd/infer/deep_eval.py
@@ -113,6 +113,7 @@ def __init__(
         else:
             # self.dp = paddle.jit.load(self.model_path.split(".json")[0])
             raise ValueError(f"Unknown model file format: {self.model_path}!")
+        self.dp.eval()
         self.rcut = self.dp.model["Default"].get_rcut()
         self.type_map = self.dp.model["Default"].get_type_map()
         if isinstance(auto_batch_size, bool):

diff --git a/deepmd/pd/train/training.py b/deepmd/pd/train/training.py
@@ -588,15 +588,14 @@ def warm_up_linear(step, warmup_steps):
         if self.opt_type == "Adam":
             self.scheduler = paddle.optimizer.lr.LambdaDecay(
                 learning_rate=self.lr_exp.start_lr,
-                lr_lambda=lambda step: warm_up_linear(
-                    step + self.start_step, self.warmup_steps
-                ),
+                lr_lambda=lambda step: warm_up_linear(step, self.warmup_steps),
             )
             self.optimizer = paddle.optimizer.Adam(
                 learning_rate=self.scheduler, parameters=self.wrapper.parameters()
             )
             if optimizer_state_dict is not None and self.restart_training:
                 self.optimizer.set_state_dict(optimizer_state_dict)
+                self.scheduler.last_epoch -= 1
         else:
             raise ValueError(f"Not supported optimizer type '{self.opt_type}'")
 

diff --git a/deepmd/pd/utils/auto_batch_size.py b/deepmd/pd/utils/auto_batch_size.py
@@ -49,12 +49,8 @@ def is_oom_error(self, e: Exception) -> bool:
         # several sources think CUSOLVER_STATUS_INTERNAL_ERROR is another out-of-memory error,
         # such as https://github.com/JuliaGPU/CUDA.jl/issues/1924
         # (the meaningless error message should be considered as a bug in cusolver)
-        if isinstance(e, RuntimeError) and (
-            "CUDA out of memory." in e.args[0]
-            or "CUDA driver error: out of memory" in e.args[0]
-            or "cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR" in e.args[0]
-        ):
+        if isinstance(e, MemoryError) and ("ResourceExhaustedError" in e.args[0]):
             # Release all unoccupied cached memory
-            # paddle.device.cuda.empty_cache()
+            paddle.device.cuda.empty_cache()
             return True
         return False
diff --git a/deepmd/pd/utils/region.py b/deepmd/pd/utils/region.py
@@ -108,5 +108,5 @@ def normalize_coord(
 
     """
     icoord = phys2inter(coord, cell)
-    icoord = paddle.remainder(icoord, paddle.full([], 1.0))
+    icoord = paddle.remainder(icoord, paddle.full([], 1.0, dtype=icoord.dtype))
     return inter2phys(icoord, cell)