From ff9b75ed57317feb2d30a792b34fe8b35298e97f Mon Sep 17 00:00:00 2001
From: James Misaka <ff6757442@gmail.com>
Date: Thu, 26 Dec 2024 15:44:30 +0800
Subject: [PATCH] Fix: Modify docs of DPA models (#4510)

Modify docs of DPA models, especially for DPA-1 website

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Updated DPA-2 model documentation for improved clarity and
accessibility.
- Changed references in the "se_atten" descriptor documentation to link
to a formal publication on Nature.
- Revised citations in the fine-tuning documentation to point to the
DPA-1 paper on Nature, enhancing the credibility of sources.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---
 doc/model/dpa2.md           | 2 +-
 doc/model/train-se-atten.md | 4 ++--
 doc/train/finetuning.md     | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index 300876bf05..a041547a14 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -4,7 +4,7 @@
 **Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
 :::
 
-The DPA-2 model implementation. See https://doi.org/10.1038/s41524-024-01493-2 for more details.
+The DPA-2 model implementation. See [DPA-2 paper](https://doi.org/10.1038/s41524-024-01493-2) for more details.
 
 Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](../../examples/water/dpa2/README.md) for inputs in different levels.
 
diff --git a/doc/model/train-se-atten.md b/doc/model/train-se-atten.md
index 92a56395f6..504b214737 100644
--- a/doc/model/train-se-atten.md
+++ b/doc/model/train-se-atten.md
@@ -8,7 +8,7 @@
 
 Here we propose DPA-1, a Deep Potential model with a novel attention mechanism, which is highly effective for representing the conformation and chemical spaces of atomic systems and learning the PES.
 
-See [this paper](https://arxiv.org/abs/2208.08236) for more information. DPA-1 is implemented as a new descriptor `"se_atten"` for model training, which can be used after simply editing the input.json.
+See [this paper](https://www.nature.com/articles/s41524-024-01278-7) for more information. DPA-1 is implemented as a new descriptor `"se_atten"` for model training, which can be used after simply editing the input.json.
 
 ## Theory
 
@@ -71,7 +71,7 @@ Then layer normalization is added in a residual way to finally obtain the self-a
 Next, we will list the detailed settings in input.json and the data format, especially for large systems with dozens of elements. An example of DPA-1 input can be found in `examples/water/se_atten/input.json`.
 
 The notation of `se_atten` is short for the smooth edition of Deep Potential with an attention mechanism.
-This descriptor was described in detail in [the DPA-1 paper](https://arxiv.org/abs/2208.08236) and the images above.
+This descriptor was described in detail in [the DPA-1 paper](https://www.nature.com/articles/s41524-024-01278-7) and the images above.
 
 In this example, we will train a DPA-1 model for a water system. A complete training input script of this example can be found in the directory:
 
diff --git a/doc/train/finetuning.md b/doc/train/finetuning.md
index 04d86cfc98..beb6012003 100644
--- a/doc/train/finetuning.md
+++ b/doc/train/finetuning.md
@@ -9,7 +9,7 @@ to vastly reduce the training cost, while it's not trivial in potential models.
 Compositions and configurations of data samples or even computational parameters in upstream software (such as VASP)
 may be different between the pre-trained and target datasets, leading to energy shifts or other diversities of training data.
 
-Recently the emerging of methods such as [DPA-1](https://arxiv.org/abs/2208.08236) has brought us to a new stage where we can
+Recently the emerging of methods such as [DPA-1](https://www.nature.com/articles/s41524-024-01278-7) has brought us to a new stage where we can
 perform similar pretraining-finetuning approaches.
 They can hopefully learn the common knowledge in the pre-trained dataset (especially the `force` information)
 and thus reduce the computational cost in downstream training tasks.
@@ -19,7 +19,7 @@ and thus reduce the computational cost in downstream training tasks.
 If you have a pre-trained model `pretrained.pb`
 (here we support models using [`se_atten`](../model/train-se-atten.md) descriptor and [`ener`](../model/train-energy.md) fitting net)
 on a large dataset (for example, [OC2M](https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md) in
-DPA-1 [paper](https://arxiv.org/abs/2208.08236)), a finetuning strategy can be performed by simply running:
+DPA-1 [paper](https://www.nature.com/articles/s41524-024-01278-7)), a finetuning strategy can be performed by simply running:
 
 ```bash
 $ dp train input.json --finetune pretrained.pb