diff --git a/README.md b/README.md index d5cc3cb..9ddd1a7 100755 --- a/README.md +++ b/README.md @@ -44,7 +44,6 @@ TencentPretrain has the following features: * argparse * packaging * regex -* For the mixed precision training you will need apex from NVIDIA * For the pre-trained model conversion (related with TensorFlow) you will need TensorFlow * For the tokenization with sentencepiece model you will need [SentencePiece](https://github.com/google/sentencepiece) * For developing a stacking model you will need LightGBM and [BayesianOptimization](https://github.com/fmfn/BayesianOptimization) @@ -135,7 +134,7 @@ The above content provides basic ways of using TencentPretrain to pre-process, p
## Pre-training data -This section provides links to a range of :arrow_right: [__pre-training data__](https://github.com/Tencent/TencentPretrain/wiki/Pretraining-data) :arrow_left: . +This section provides links to a range of :arrow_right: [__pre-training data__](https://github.com/Tencent/TencentPretrain/wiki/Pretraining-data) :arrow_left: . TencentPretrain can load these pre-training data directly.
@@ -145,7 +144,7 @@ This section provides links to a range of :arrow_right: [__downstream datasets__
## Modelzoo -With the help of TencentPretrain, we pre-trained models of different properties (e.g. models based on different modalities, encoders, and targets). Detailed introduction of pre-trained models and their download links can be found in :arrow_right: [__modelzoo__](https://github.com/Tencent/TencentPretrain/wiki/Modelzoo) :arrow_left: . All pre-trained models can be loaded by TencentPretrain directly. More pre-trained models will be released in the future. +With the help of TencentPretrain, we pre-trained models of different properties (e.g. models based on different modalities, encoders, and targets). Detailed introduction of pre-trained models and their download links can be found in :arrow_right: [__modelzoo__](https://github.com/Tencent/TencentPretrain/wiki/Modelzoo) :arrow_left: . All pre-trained models can be loaded by TencentPretrain directly.
@@ -183,7 +182,7 @@ TencentPretrain/ ``` -The code is well-organized. Users can use and extend upon it with little efforts. +The code is organized based on components (e.g. embeddings, encoders). Users can use and extend upon it with little efforts. Comprehensive examples of using TencentPretrain can be found in :arrow_right: [__instructions__](https://github.com/Tencent/TencentPretrain/wiki/Instructions) :arrow_left: , which help users quickly implement pre-training models such as BERT, GPT-2, ELMo, T5, CLIP and fine-tune pre-trained models on a range of downstream tasks. diff --git a/README_ZH.md b/README_ZH.md index 5fff608..d47ad6e 100755 --- a/README_ZH.md +++ b/README_ZH.md @@ -41,7 +41,6 @@ TencentPretrain有如下几方面优势: * argparse * packaging * regex -* 如果使用混合精度,需要安装英伟达的apex * 如果涉及到TensorFlow模型的转换,需要安装TensorFlow * 如果在tokenizer中使用sentencepiece模型,需要安装[SentencePiece](https://github.com/google/sentencepiece) * 如果使用模型集成stacking,需要安装LightGBM和[BayesianOptimization](https://github.com/fmfn/BayesianOptimization) @@ -132,7 +131,7 @@ python3 inference/run_classifier_infer.py --load_model_path models/finetuned_mod
## 预训练数据 -我们提供了链接,指向一系列开源的 :arrow_right: [__预训练数据__](https://github.com/Tencent/TencentPretrain/wiki/预训练数据) :arrow_left: 。 +我们提供了链接,指向一系列开源的 :arrow_right: [__预训练数据__](https://github.com/Tencent/TencentPretrain/wiki/预训练数据) :arrow_left: 。TencentPretrain可以直接加载这些预训练数据。
@@ -142,7 +141,7 @@ python3 inference/run_classifier_infer.py --load_model_path models/finetuned_mod
## 预训练模型仓库 -借助TencentPretrain,我们训练不同性质的预训练模型(例如基于不同模态、编码器、目标任务)。用户可以在 :arrow_right: [__预训练模型仓库__](https://github.com/Tencent/TencentPretrain/wiki/预训练模型仓库) :arrow_left: 中找到各种性质的预训练模型以及它们对应的描述和下载链接。所有预训练模型都可以由TencentPretrain直接加载。将来我们会发布更多的预训练模型。 +借助TencentPretrain,我们训练不同性质的预训练模型(例如基于不同模态、编码器、目标任务)。用户可以在 :arrow_right: [__预训练模型仓库__](https://github.com/Tencent/TencentPretrain/wiki/预训练模型仓库) :arrow_left: 中找到各种性质的预训练模型以及它们对应的描述和下载链接。所有预训练模型都可以由TencentPretrain直接加载。