open-compass · MaiziXiao · Sep 5, 2024 · Sep 6, 2024 · Sep 6, 2024 · Sep 13, 2024
diff --git a/.github/workflows/daily-run-test.yml b/.github/workflows/daily-run-test.yml
@@ -97,7 +97,7 @@ jobs:
           conda activate ${{env.CONDA_ENV}}
           conda info --envs
           export from_tf=TRUE
-          python tools/list_configs.py internlm2_5 mmlu
+          python opencompass/tools/list_configs.py internlm2_5 mmlu
           python run.py --models hf_internlm2_5_7b --datasets race_ppl --work-dir /cpfs01/user/qa-llm-cicd/report/${{ github.run_id }}/cmd1 --reuse
           rm regression_result_daily -f && ln -s /cpfs01/user/qa-llm-cicd/report/${{ github.run_id }}/cmd1/*/summary regression_result_daily
           python -m pytest -m case1 -s -v --color=yes .github/scripts/oc_score_assert.py

diff --git a/.github/workflows/pr-run-test.yml b/.github/workflows/pr-run-test.yml
@@ -7,7 +7,7 @@ on:
       - 'README_zh-CN.md'
       - 'docs/**'
       - 'configs/**'
-      - 'tools/**'
+      - 'opencompass/tools/**'
 
   workflow_dispatch:
   schedule:

diff --git a/.github/workflows/pr-stage-check.yml b/.github/workflows/pr-stage-check.yml
@@ -7,7 +7,7 @@ on:
       - 'README_zh-CN.md'
       - 'docs/**'
       - 'configs/**'
-      - 'tools/**'
+      - 'opencompass/tools/**'
 
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}

diff --git a/.pre-commit-config-zh-cn.yaml b/.pre-commit-config-zh-cn.yaml
@@ -103,7 +103,7 @@ repos:
     hooks:
     -   id: update-dataset-suffix
         name: dataset suffix updater
-        entry: ./tools/update_dataset_suffix.py
+        entry: ./opencompass/tools/update_dataset_suffix.py
         language: script
         pass_filenames: true
         require_serial: true
@@ -112,7 +112,7 @@ repos:
     hooks:
     -   id: update-dataset-suffix-pacakge
         name: dataset suffix updater(package)
-        entry: ./tools/update_dataset_suffix.py
+        entry: ./opencompass/tools/update_dataset_suffix.py
         language: script
         pass_filenames: false
         # require_serial: true
@@ -124,7 +124,7 @@ repos:
     hooks:
     -   id: compare-configs-datasets
         name: compare configs datasets
-        entry: ./tools/compare_configs.py
+        entry: ./opencompass/tools/compare_configs.py
         language: script
         pass_filenames: false
         # require_serial: true
@@ -135,7 +135,7 @@ repos:
     hooks:
     -   id: compare-configs-models
         name: compare configs models
-        entry: ./tools/compare_configs.py
+        entry: ./opencompass/tools/compare_configs.py
         language: script
         pass_filenames: false
         # require_serial: true
@@ -148,7 +148,7 @@ repos:
     hooks:
     -   id: compare-configs-summarizers
         name: compare configs summarizers
-        entry: ./tools/compare_configs.py
+        entry: ./opencompass/tools/compare_configs.py
         language: script
         pass_filenames: false
         # require_serial: true

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -105,7 +105,7 @@ repos:
     hooks:
     -   id: update-dataset-suffix
         name: dataset suffix updater
-        entry: ./tools/update_dataset_suffix.py
+        entry: ./opencompass/tools/update_dataset_suffix.py
         language: script
         pass_filenames: true
         require_serial: true
@@ -114,7 +114,7 @@ repos:
     hooks:
     -   id: update-dataset-suffix-pacakge
         name: dataset suffix updater(package)
-        entry: ./tools/update_dataset_suffix.py
+        entry: ./opencompass/tools/update_dataset_suffix.py
         language: script
         pass_filenames: false
         # require_serial: true
@@ -126,7 +126,7 @@ repos:
     hooks:
     -   id: compare-configs-datasets
         name: compare configs datasets
-        entry: ./tools/compare_configs.py
+        entry: ./opencompass/tools/compare_configs.py
         language: script
         pass_filenames: false
         # require_serial: true
@@ -137,7 +137,7 @@ repos:
     hooks:
     -   id: compare-configs-models
         name: compare configs models
-        entry: ./tools/compare_configs.py
+        entry: ./opencompass/tools/compare_configs.py
         language: script
         pass_filenames: false
         # require_serial: true
@@ -150,7 +150,7 @@ repos:
     hooks:
     -   id: compare-configs-summarizers
         name: compare configs summarizers
-        entry: ./tools/compare_configs.py
+        entry: ./opencompass/tools/compare_configs.py
         language: script
         pass_filenames: false
         # require_serial: true

diff --git a/README.md b/README.md
@@ -207,6 +207,7 @@ After ensuring that OpenCompass is installed correctly according to the above st
 
   ```bash
   # CLI
+
   opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen
 
   # Python scripts
@@ -246,9 +247,12 @@ After ensuring that OpenCompass is installed correctly according to the above st
 
   ```bash
   # List all configurations
-  python tools/list_configs.py
+  opencompass-tool list-configs
+  # It is equal to: python opencompass/tools/list_configs.py
+
   # List all configurations related to llama and mmlu
-  python tools/list_configs.py llama mmlu
+  opencompass-tool list-configs llama mmlu
+  # python opencompass/tools/list_configs.py llama mmlu
   ```
 
   If the model is not on the list but supported by Huggingface AutoModel class, you can also evaluate it with OpenCompass. You are welcome to contribute to the maintenance of the OpenCompass supported model and dataset lists.

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -234,15 +234,18 @@ humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ce
   opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen -a lmdeploy
   ```
 
-  OpenCompass 预定义了许多模型和数据集的配置，你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
-
 - ### 支持的模型
 
+  OpenCompass 预定义了许多模型和数据集的配置，你可以通过OpenCompass提供的[工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
+
   ```bash
   # 列出所有配置
-  python tools/list_configs.py
+  opencompass-tool list-configs
+  # It is equal to: python opencompass/tools/list_configs.py
+
   # 列出所有跟 llama 及 mmlu 相关的配置
-  python tools/list_configs.py llama mmlu
+  opencompass-tool list-configs llama mmlu
+  # python opencompass/tools/list_configs.py llama mmlu
   ```
 
   如果模型不在列表中但支持 Huggingface AutoModel 类，您仍然可以使用 OpenCompass 对其进行评估。欢迎您贡献维护 OpenCompass 支持的模型和数据集列表。

diff --git a/configs/datasets/CHARM/README.md b/configs/datasets/CHARM/README.md
@@ -142,13 +142,13 @@ outputs
 cd ${path_to_CHARM_repo}
 
 # generate Table5, Table6, Table9 and Table10 in https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
+PYTHONPATH=. python opencompass/tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
 
 # generate Figure3 and Figure9 in https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
+PYTHONPATH=. python opencompass/tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
 
 # generate Table7, Table12, Table13 and Figure11 in https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
+PYTHONPATH=. python opencompass/tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
 ```
 
 ## 🖊️ Citation

diff --git a/configs/datasets/CHARM/README_ZH.md b/configs/datasets/CHARM/README_ZH.md
@@ -140,13 +140,13 @@ outputs
 cd ${path_to_CHARM_repo}
 
 # 生成论文中的Table5, Table6, Table9 and Table10，详见https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
+PYTHONPATH=. python opencompass/tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
 
 # 生成论文中的Figure3 and Figure9，详见https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
+PYTHONPATH=. python opencompass/tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
 
 # 生成论文中的Table7, Table12, Table13 and Figure11，详见https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
+PYTHONPATH=. python opencompass/tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
 ```
 
 ## 🖊️ 引用

diff --git a/docs/en/advanced_guides/code_eval_service.md b/docs/en/advanced_guides/code_eval_service.md
@@ -120,7 +120,7 @@ In OpenCompass's tools folder, there is a script called `collect_code_preds.py`
 It is the same with `-r` option in `run.py`. More details can be referred through the [documentation](https://opencompass.readthedocs.io/en/latest/get_started/quick_start.html#launching-evaluation).
 
 ```shell
-python tools/collect_code_preds.py [config] [-r latest]
+python opencompass/tools/collect_code_preds.py [config] [-r latest]
 ```
 
 The collected results will be organized as following under the `-r` folder:

diff --git a/docs/en/get_started/quick_start.md b/docs/en/get_started/quick_start.md
@@ -67,19 +67,19 @@ Users can combine the models and datasets they want to test using `--models` and
 python run.py --models hf_opt_125m hf_opt_350m --datasets siqa_gen winograd_ppl
 ```
 
-The models and datasets are pre-stored in the form of configuration files in `configs/models` and `configs/datasets`. Users can view or filter the currently available model and dataset configurations using `tools/list_configs.py`.
+The models and datasets are pre-stored in the form of configuration files in `configs/models` and `configs/datasets`. Users can view or filter the currently available model and dataset configurations using `opencompass/tools/list_configs.py`.
 
 ```bash
 # List all configurations
-python tools/list_configs.py
+python opencompass/tools/list_configs.py
 # List all configurations related to llama and mmlu
-python tools/list_configs.py llama mmlu
+python opencompass/tools/list_configs.py llama mmlu
 ```
 
 :::{dropdown} More about `list_configs`
 :animate: fade-in-slide-down
 
-Running `python tools/list_configs.py llama mmlu` gives the output like:
+Running `python opencompass/tools/list_configs.py llama mmlu` gives the output like:
 
 ```text
 +-----------------+-----------------------------------+

diff --git a/docs/en/notes/news.md b/docs/en/notes/news.md
@@ -27,6 +27,6 @@
 - **\[2023.08.10\]** OpenCompass is compatible with [LMDeploy](https://github.com/InternLM/lmdeploy). Now you can follow this [instruction](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_turbomind.html#) to evaluate the accelerated models provide by the **Turbomind**.
 - **\[2023.08.10\]** We have supported [Qwen-7B](https://github.com/QwenLM/Qwen-7B) and [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B) ! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
 - **\[2023.08.09\]** Several new datasets(**CMMLU, TydiQA, SQuAD2.0, DROP**) are updated on our [leaderboard](https://opencompass.org.cn/leaderboard-llm)! More datasets are welcomed to join OpenCompass.
-- **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev.
+- **\[2023.08.07\]** We have added a [script](opencompass/tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev.
 - **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
 - **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass.
diff --git a/docs/en/prompt/meta_template.md b/docs/en/prompt/meta_template.md
@@ -260,4 +260,4 @@ In this regard, OpenCompass has preset three `api_role` values for API models: `
 
 ## Debugging
 
-If you need to debug the prompt, it is recommended to use the `tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
+If you need to debug the prompt, it is recommended to use the `opencompass/tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
diff --git a/docs/en/tools.md b/docs/en/tools.md
@@ -7,7 +7,7 @@ This tool allows you to directly view the generated prompt without starting the
 Running method:
 
 ```bash
-python tools/prompt_viewer.py CONFIG_PATH [-n] [-a] [-p PATTERN]
+python opencompass/tools/prompt_viewer.py CONFIG_PATH [-n] [-a] [-p PATTERN]
 ```
 
 - `-n`: Do not enter interactive mode, select the first model (if any) and dataset by default.
@@ -21,7 +21,7 @@ Based on existing evaluation results, this tool produces inference error samples
 Running method:
 
 ```bash
-python tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR]
+python opencompass/tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR]
 ```
 
 - `-w`: Work path, default is `'./outputs/default'`.
@@ -55,7 +55,7 @@ This tool can quickly test whether the functionality of the API model is normal.
 Running method:
 
 ```bash
-python tools/test_api_model.py [CONFIG_PATH] -n
+python opencompass/tools/test_api_model.py [CONFIG_PATH] -n
 ```
 
 ## Prediction Merger
@@ -65,7 +65,7 @@ This tool can merge patitioned predictions.
 Running method:
 
 ```bash
-python tools/prediction_merger.py CONFIG_PATH [-w WORK_DIR]
+python opencompass/tools/prediction_merger.py CONFIG_PATH [-w WORK_DIR]
 ```
 
 - `-w`: Work path, default is `'./outputs/default'`.
@@ -77,15 +77,15 @@ This tool can list or search all available model and dataset configurations. It
 Usage:
 
 ```bash
-python tools/list_configs.py [PATTERN1] [PATTERN2] [...]
+python opencompass/tools/list_configs.py [PATTERN1] [PATTERN2] [...]
 ```
 
 If executed without any parameters, it will list all model configurations in the `configs/models` and `configs/dataset` directories by default.
 
 Users can also pass any number of parameters. The script will list all configurations related to the provided strings, supporting fuzzy search and the use of the * wildcard. For example, the following command will list all configurations related to `mmlu` and `llama`:
 
 ```bash
-python tools/list_configs.py mmlu llama
+python opencompass/tools/list_configs.py mmlu llama
 ```
 
 Its output could be:
@@ -129,5 +129,5 @@ This tool can quickly modify the suffixes of configuration files located under t
 How to run:
 
 ```bash
-python tools/update_dataset_suffix.py
+python opencompass/tools/update_dataset_suffix.py
 ```
diff --git a/docs/zh_cn/advanced_guides/code_eval_service.md b/docs/zh_cn/advanced_guides/code_eval_service.md
@@ -118,7 +118,7 @@ humanevalx_datasets = [
 OpenCompass 在 `tools` 中提供了 `collect_code_preds.py` 脚本对推理结果进行后处理并收集，我们只需要提供启动任务时的配置文件，以及指定复用对应任务的工作目录，其配置与 `run.py` 中的 `-r` 一致，细节可参考[文档](https://opencompass.readthedocs.io/zh-cn/latest/get_started/quick_start.html#id4)。
 
 ```shell
-python tools/collect_code_preds.py [config] [-r latest]
+python opencompass/tools/collect_code_preds.py [config] [-r latest]
 ```
 
 收集到的结果将会按照以下的目录结构保存到 `-r` 对应的工作目录中：
@@ -201,7 +201,7 @@ curl -X POST -F 'file=@./internlm-chat-7b-hf-v11/ds1000_Numpy.json' -F 'debug=er
 ### 修改后处理
 
 1. 本地评测中，可以按照支持新数据集教程中的后处理部分来修改后处理方法；
-2. 异地评测中，可以修改 `tools/collect_code_preds.py` 中的后处理部分；
+2. 异地评测中，可以修改 `opencompass/tools/collect_code_preds.py` 中的后处理部分；
 3. 代码评测服务中，存在部分后处理也可以进行修改，详情参考下一部分教程；
 
 ### 代码评测服务 Debug

diff --git a/docs/zh_cn/get_started/faq.md b/docs/zh_cn/get_started/faq.md
@@ -43,7 +43,7 @@ OpenCompass 默认使用 num_worker_partitioner。OpenCompass 的评测从本质
 
 根据上述过程，有如下推论：
 
-- 在已有输出文件夹的基础上断点继续时，不可以更换 partitioner 的切片方式，或者不可以修改 `--max-num-workers` 的入参。(除非使用了 `tools/prediction_merger.py` 工具)
+- 在已有输出文件夹的基础上断点继续时，不可以更换 partitioner 的切片方式，或者不可以修改 `--max-num-workers` 的入参。(除非使用了 `opencompass/tools/prediction_merger.py` 工具)
 - 如果数据集有了任何修改，不要断点继续，或者根据需要将原有的输出文件进行删除后，全部重跑。
 
 ### OpenCompass 如何分配 GPU？

diff --git a/docs/zh_cn/get_started/quick_start.md b/docs/zh_cn/get_started/quick_start.md
@@ -90,19 +90,19 @@ python run.py \
     --debug
 ```
 
-模型和数据集的配置文件预存于 `configs/models` 和 `configs/datasets` 中。用户可以使用 `tools/list_configs.py` 查看或过滤当前可用的模型和数据集配置。
+模型和数据集的配置文件预存于 `configs/models` 和 `configs/datasets` 中。用户可以使用 `opencompass/tools/list_configs.py` 查看或过滤当前可用的模型和数据集配置。
 
 ```bash
 # 列出所有配置
-python tools/list_configs.py
+python opencompass/tools/list_configs.py
 # 列出与llama和mmlu相关的所有配置
-python tools/list_configs.py llama mmlu
+python opencompass/tools/list_configs.py llama mmlu
 ```
 
 :::{dropdown} 关于 `list_configs`
 :animate: fade-in-slide-down
 
-运行 `python tools/list_configs.py llama mmlu` 将产生如下输出：
+运行 `python opencompass/tools/list_configs.py llama mmlu` 将产生如下输出：
 
 ```text
 +-----------------+-----------------------------------+

diff --git a/docs/zh_cn/notes/news.md b/docs/zh_cn/notes/news.md
@@ -27,6 +27,6 @@
 - **\[2023.08.10\]** OpenCompass 现已适配 [LMDeploy](https://github.com/InternLM/lmdeploy). 请参考 [评测指南](https://opencompass.readthedocs.io/zh_CN/latest/advanced_guides/evaluation_turbomind.html) 对 **Turbomind** 加速后的模型进行评估.
 - **\[2023.08.10\]**  [Qwen-7B](https://github.com/QwenLM/Qwen-7B) 和 [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B)的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
 - **\[2023.08.09\]** 更新更多评测数据集(**CMMLU, TydiQA, SQuAD2.0, DROP**) ，请登录 [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm) 查看更多结果! 欢迎添加你的评测数据集到OpenCompass.
-- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果.
+- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](opencompass/tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果.
 - **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
 - **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass.
diff --git a/docs/zh_cn/prompt/meta_template.md b/docs/zh_cn/prompt/meta_template.md
@@ -260,4 +260,4 @@ meta_template=dict(
 
 ## 调试
 
-如果需要调试 prompt，建议在准备好配置文件后，使用 `tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多。
+如果需要调试 prompt，建议在准备好配置文件后，使用 `opencompass/tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多。