Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
README.md		README.md
got_ocr2_0_infer.py		got_ocr2_0_infer.py
run_train.sh		run_train.sh
train_GOT.py		train_GOT.py

README.md

GOT-OCR2.0

1. 模型介绍

GOT-OCR2.0是由 StepFun 和中国科学院大学推出的专用于通用 OCR 任务的多模态大模型，参数量 0.6B，是一款极具突破性的通用OCR多模态模型，旨在解决传统OCR系统（OCR-1.0）和当前大规模视觉语言模型（LVLMs）在OCR任务中的局限性。

本仓库支持的模型权重:

Model
stepfun-ai/GOT-OCR2_0

注意：与huggingface权重同名，但权重为paddle框架的Tensor，使用xxx.from_pretrained("stepfun-ai/GOT-OCR2_0")即可自动下载该权重文件夹到缓存目录。

2. 环境要求

python >= 3.10
paddlepaddle-gpu 要求3.0.0b2或版本develop

# develop版安装示例
python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

paddlenlp == 3.0.0b2

注：(默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡。V100请用float16推理。

3 推理预测

3.1. plain texts OCR:

python paddlemix/examples/GOT_OCR_2_0/got_ocr2_0_infer.py \
  --model_name_or_path stepfun-ai/GOT-OCR2_0 \
  --image_file paddlemix/demo_images/hospital.jpeg \
  --ocr_type ocr \

3.2. format texts OCR:

python paddlemix/examples/GOT_OCR_2_0/got_ocr2_0_infer.py \
  --model_name_or_path stepfun-ai/GOT-OCR2_0 \
  --image_file paddlemix/demo_images/hospital.jpeg \
  --ocr_type format \

3.3. multi_crop plain texts OCR:

python paddlemix/examples/GOT_OCR_2_0/got_ocr2_0_infer.py \
  --model_name_or_path stepfun-ai/GOT-OCR2_0 \
  --image_file paddlemix/demo_images/hospital.jpeg \
  --ocr_type ocr \
  --multi_crop \

4 训练

与官方github代码库一样，目前仅支持基于GOT权重的post-training(stage-2/stage-3)，其中stage2是全参数微调，stage3是冻结vision encoder后微调，默认训练方式是stage2全参数微调，训练显存约10GB每卡。

数据集下载

PaddleMIX团队提供了一个改版的SynthDoG-EN数据集，统一修改了其原先的question为<image>\nOCR:，下载链接为：

wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground/synthdog_en.tar # 2.4G

synthdog_en.tar包括了图片images文件夹和标注json文件，需下载解压或软链接在PaddleMIX/目录下。

数据集格式

同官方例子，其中question统一为<image>\nOCR:，answer是其OCR结果。

训练命令

sh paddlemix/examples/GOT_OCR_2_0/run_train.sh

注意：默认训练方式是stage2全参数微调，训练显存约10GB每卡。也可通过设置--freeze_vision_tower True冻结vision encoder后微调。

参考文献

@article{wei2024general,
  title={General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model},
  author={Wei, Haoran and Liu, Chenglong and Chen, Jinyue and Wang, Jia and Kong, Lingyu and Xu, Yanming and Ge, Zheng and Zhao, Liang and Sun, Jianjian and Peng, Yuang and others},
  journal={arXiv preprint arXiv:2409.01704},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GOT_OCR_2_0

GOT_OCR_2_0

README.md

GOT-OCR2.0

1. 模型介绍

2. 环境要求

3 推理预测

3.1. plain texts OCR:

3.2. format texts OCR:

3.3. multi_crop plain texts OCR:

4 训练

数据集下载

数据集格式

训练命令

参考文献

Files

GOT_OCR_2_0

Directory actions

More options

Directory actions

More options

Latest commit

History

GOT_OCR_2_0

Folders and files

parent directory

README.md

GOT-OCR2.0

1. 模型介绍

2. 环境要求

3 推理预测

3.1. plain texts OCR:

3.2. format texts OCR:

3.3. multi_crop plain texts OCR:

4 训练

数据集下载

数据集格式

训练命令

参考文献