Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update octools CLI #1496

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/daily-run-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ jobs:
conda activate ${{env.CONDA_ENV}}
conda info --envs
export from_tf=TRUE
python tools/list_configs.py internlm2_5 mmlu
python opencompass/tools/list_configs.py internlm2_5 mmlu
python run.py --models hf_internlm2_5_7b --datasets race_ppl --work-dir /cpfs01/user/qa-llm-cicd/report/${{ github.run_id }}/cmd1 --reuse
rm regression_result_daily -f && ln -s /cpfs01/user/qa-llm-cicd/report/${{ github.run_id }}/cmd1/*/summary regression_result_daily
python -m pytest -m case1 -s -v --color=yes .github/scripts/oc_score_assert.py
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr-run-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
- 'README_zh-CN.md'
- 'docs/**'
- 'configs/**'
- 'tools/**'
- 'opencompass/tools/**'

workflow_dispatch:
schedule:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr-stage-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
- 'README_zh-CN.md'
- 'docs/**'
- 'configs/**'
- 'tools/**'
- 'opencompass/tools/**'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
10 changes: 5 additions & 5 deletions .pre-commit-config-zh-cn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ repos:
hooks:
- id: update-dataset-suffix
name: dataset suffix updater
entry: ./tools/update_dataset_suffix.py
entry: ./opencompass/tools/update_dataset_suffix.py
language: script
pass_filenames: true
require_serial: true
Expand All @@ -112,7 +112,7 @@ repos:
hooks:
- id: update-dataset-suffix-pacakge
name: dataset suffix updater(package)
entry: ./tools/update_dataset_suffix.py
entry: ./opencompass/tools/update_dataset_suffix.py
language: script
pass_filenames: false
# require_serial: true
Expand All @@ -124,7 +124,7 @@ repos:
hooks:
- id: compare-configs-datasets
name: compare configs datasets
entry: ./tools/compare_configs.py
entry: ./opencompass/tools/compare_configs.py
language: script
pass_filenames: false
# require_serial: true
Expand All @@ -135,7 +135,7 @@ repos:
hooks:
- id: compare-configs-models
name: compare configs models
entry: ./tools/compare_configs.py
entry: ./opencompass/tools/compare_configs.py
language: script
pass_filenames: false
# require_serial: true
Expand All @@ -148,7 +148,7 @@ repos:
hooks:
- id: compare-configs-summarizers
name: compare configs summarizers
entry: ./tools/compare_configs.py
entry: ./opencompass/tools/compare_configs.py
language: script
pass_filenames: false
# require_serial: true
Expand Down
10 changes: 5 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ repos:
hooks:
- id: update-dataset-suffix
name: dataset suffix updater
entry: ./tools/update_dataset_suffix.py
entry: ./opencompass/tools/update_dataset_suffix.py
language: script
pass_filenames: true
require_serial: true
Expand All @@ -114,7 +114,7 @@ repos:
hooks:
- id: update-dataset-suffix-pacakge
name: dataset suffix updater(package)
entry: ./tools/update_dataset_suffix.py
entry: ./opencompass/tools/update_dataset_suffix.py
language: script
pass_filenames: false
# require_serial: true
Expand All @@ -126,7 +126,7 @@ repos:
hooks:
- id: compare-configs-datasets
name: compare configs datasets
entry: ./tools/compare_configs.py
entry: ./opencompass/tools/compare_configs.py
language: script
pass_filenames: false
# require_serial: true
Expand All @@ -137,7 +137,7 @@ repos:
hooks:
- id: compare-configs-models
name: compare configs models
entry: ./tools/compare_configs.py
entry: ./opencompass/tools/compare_configs.py
language: script
pass_filenames: false
# require_serial: true
Expand All @@ -150,7 +150,7 @@ repos:
hooks:
- id: compare-configs-summarizers
name: compare configs summarizers
entry: ./tools/compare_configs.py
entry: ./opencompass/tools/compare_configs.py
language: script
pass_filenames: false
# require_serial: true
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ After ensuring that OpenCompass is installed correctly according to the above st

```bash
# CLI

opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen

# Python scripts
Expand Down Expand Up @@ -246,9 +247,12 @@ After ensuring that OpenCompass is installed correctly according to the above st

```bash
# List all configurations
python tools/list_configs.py
opencompass-tool list-configs
# It is equal to: python opencompass/tools/list_configs.py

# List all configurations related to llama and mmlu
python tools/list_configs.py llama mmlu
opencompass-tool list-configs llama mmlu
# python opencompass/tools/list_configs.py llama mmlu
```

If the model is not on the list but supported by Huggingface AutoModel class, you can also evaluate it with OpenCompass. You are welcome to contribute to the maintenance of the OpenCompass supported model and dataset lists.
Expand Down
11 changes: 7 additions & 4 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,15 +234,18 @@ humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ce
opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen -a lmdeploy
```

OpenCompass 预定义了许多模型和数据集的配置,你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。

- ### 支持的模型

OpenCompass 预定义了许多模型和数据集的配置,你可以通过OpenCompass提供的[工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。

```bash
# 列出所有配置
python tools/list_configs.py
opencompass-tool list-configs
# It is equal to: python opencompass/tools/list_configs.py

# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu
opencompass-tool list-configs llama mmlu
# python opencompass/tools/list_configs.py llama mmlu
MaiziXiao marked this conversation as resolved.
Show resolved Hide resolved
```

如果模型不在列表中但支持 Huggingface AutoModel 类,您仍然可以使用 OpenCompass 对其进行评估。欢迎您贡献维护 OpenCompass 支持的模型和数据集列表。
Expand Down
6 changes: 3 additions & 3 deletions configs/datasets/CHARM/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,13 +142,13 @@ outputs
cd ${path_to_CHARM_repo}

# generate Table5, Table6, Table9 and Table10 in https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
PYTHONPATH=. python opencompass/tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}

# generate Figure3 and Figure9 in https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
PYTHONPATH=. python opencompass/tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}

# generate Table7, Table12, Table13 and Figure11 in https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
PYTHONPATH=. python opencompass/tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
```

## 🖊️ Citation
Expand Down
6 changes: 3 additions & 3 deletions configs/datasets/CHARM/README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,13 @@ outputs
cd ${path_to_CHARM_repo}

# 生成论文中的Table5, Table6, Table9 and Table10,详见https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
PYTHONPATH=. python opencompass/tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}

# 生成论文中的Figure3 and Figure9,详见https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
PYTHONPATH=. python opencompass/tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}

# 生成论文中的Table7, Table12, Table13 and Figure11,详见https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
PYTHONPATH=. python opencompass/tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
```

## 🖊️ 引用
Expand Down
2 changes: 1 addition & 1 deletion docs/en/advanced_guides/code_eval_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ In OpenCompass's tools folder, there is a script called `collect_code_preds.py`
It is the same with `-r` option in `run.py`. More details can be referred through the [documentation](https://opencompass.readthedocs.io/en/latest/get_started/quick_start.html#launching-evaluation).

```shell
python tools/collect_code_preds.py [config] [-r latest]
python opencompass/tools/collect_code_preds.py [config] [-r latest]
```

The collected results will be organized as following under the `-r` folder:
Expand Down
8 changes: 4 additions & 4 deletions docs/en/get_started/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,19 +67,19 @@ Users can combine the models and datasets they want to test using `--models` and
python run.py --models hf_opt_125m hf_opt_350m --datasets siqa_gen winograd_ppl
```

The models and datasets are pre-stored in the form of configuration files in `configs/models` and `configs/datasets`. Users can view or filter the currently available model and dataset configurations using `tools/list_configs.py`.
The models and datasets are pre-stored in the form of configuration files in `configs/models` and `configs/datasets`. Users can view or filter the currently available model and dataset configurations using `opencompass/tools/list_configs.py`.

```bash
# List all configurations
python tools/list_configs.py
python opencompass/tools/list_configs.py
# List all configurations related to llama and mmlu
python tools/list_configs.py llama mmlu
python opencompass/tools/list_configs.py llama mmlu
```

:::{dropdown} More about `list_configs`
:animate: fade-in-slide-down

Running `python tools/list_configs.py llama mmlu` gives the output like:
Running `python opencompass/tools/list_configs.py llama mmlu` gives the output like:

```text
+-----------------+-----------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/en/notes/news.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@
- **\[2023.08.10\]** OpenCompass is compatible with [LMDeploy](https://github.com/InternLM/lmdeploy). Now you can follow this [instruction](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_turbomind.html#) to evaluate the accelerated models provide by the **Turbomind**.
- **\[2023.08.10\]** We have supported [Qwen-7B](https://github.com/QwenLM/Qwen-7B) and [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B) ! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
- **\[2023.08.09\]** Several new datasets(**CMMLU, TydiQA, SQuAD2.0, DROP**) are updated on our [leaderboard](https://opencompass.org.cn/leaderboard-llm)! More datasets are welcomed to join OpenCompass.
- **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev.
- **\[2023.08.07\]** We have added a [script](opencompass/tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev.
- **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass.
2 changes: 1 addition & 1 deletion docs/en/prompt/meta_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,4 +260,4 @@ In this regard, OpenCompass has preset three `api_role` values for API models: `

## Debugging

If you need to debug the prompt, it is recommended to use the `tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
If you need to debug the prompt, it is recommended to use the `opencompass/tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
14 changes: 7 additions & 7 deletions docs/en/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This tool allows you to directly view the generated prompt without starting the
Running method:

```bash
python tools/prompt_viewer.py CONFIG_PATH [-n] [-a] [-p PATTERN]
python opencompass/tools/prompt_viewer.py CONFIG_PATH [-n] [-a] [-p PATTERN]
```

- `-n`: Do not enter interactive mode, select the first model (if any) and dataset by default.
Expand All @@ -21,7 +21,7 @@ Based on existing evaluation results, this tool produces inference error samples
Running method:

```bash
python tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR]
python opencompass/tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR]
```

- `-w`: Work path, default is `'./outputs/default'`.
Expand Down Expand Up @@ -55,7 +55,7 @@ This tool can quickly test whether the functionality of the API model is normal.
Running method:

```bash
python tools/test_api_model.py [CONFIG_PATH] -n
python opencompass/tools/test_api_model.py [CONFIG_PATH] -n
```

## Prediction Merger
Expand All @@ -65,7 +65,7 @@ This tool can merge patitioned predictions.
Running method:

```bash
python tools/prediction_merger.py CONFIG_PATH [-w WORK_DIR]
python opencompass/tools/prediction_merger.py CONFIG_PATH [-w WORK_DIR]
```

- `-w`: Work path, default is `'./outputs/default'`.
Expand All @@ -77,15 +77,15 @@ This tool can list or search all available model and dataset configurations. It
Usage:

```bash
python tools/list_configs.py [PATTERN1] [PATTERN2] [...]
python opencompass/tools/list_configs.py [PATTERN1] [PATTERN2] [...]
```

If executed without any parameters, it will list all model configurations in the `configs/models` and `configs/dataset` directories by default.

Users can also pass any number of parameters. The script will list all configurations related to the provided strings, supporting fuzzy search and the use of the * wildcard. For example, the following command will list all configurations related to `mmlu` and `llama`:

```bash
python tools/list_configs.py mmlu llama
python opencompass/tools/list_configs.py mmlu llama
```

Its output could be:
Expand Down Expand Up @@ -129,5 +129,5 @@ This tool can quickly modify the suffixes of configuration files located under t
How to run:

```bash
python tools/update_dataset_suffix.py
python opencompass/tools/update_dataset_suffix.py
```
4 changes: 2 additions & 2 deletions docs/zh_cn/advanced_guides/code_eval_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ humanevalx_datasets = [
OpenCompass 在 `tools` 中提供了 `collect_code_preds.py` 脚本对推理结果进行后处理并收集,我们只需要提供启动任务时的配置文件,以及指定复用对应任务的工作目录,其配置与 `run.py` 中的 `-r` 一致,细节可参考[文档](https://opencompass.readthedocs.io/zh-cn/latest/get_started/quick_start.html#id4)。

```shell
python tools/collect_code_preds.py [config] [-r latest]
python opencompass/tools/collect_code_preds.py [config] [-r latest]
```

收集到的结果将会按照以下的目录结构保存到 `-r` 对应的工作目录中:
Expand Down Expand Up @@ -201,7 +201,7 @@ curl -X POST -F 'file=@./internlm-chat-7b-hf-v11/ds1000_Numpy.json' -F 'debug=er
### 修改后处理

1. 本地评测中,可以按照支持新数据集教程中的后处理部分来修改后处理方法;
2. 异地评测中,可以修改 `tools/collect_code_preds.py` 中的后处理部分;
2. 异地评测中,可以修改 `opencompass/tools/collect_code_preds.py` 中的后处理部分;
3. 代码评测服务中,存在部分后处理也可以进行修改,详情参考下一部分教程;

### 代码评测服务 Debug
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/get_started/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ OpenCompass 默认使用 num_worker_partitioner。OpenCompass 的评测从本质

根据上述过程,有如下推论:

- 在已有输出文件夹的基础上断点继续时,不可以更换 partitioner 的切片方式,或者不可以修改 `--max-num-workers` 的入参。(除非使用了 `tools/prediction_merger.py` 工具)
- 在已有输出文件夹的基础上断点继续时,不可以更换 partitioner 的切片方式,或者不可以修改 `--max-num-workers` 的入参。(除非使用了 `opencompass/tools/prediction_merger.py` 工具)
- 如果数据集有了任何修改,不要断点继续,或者根据需要将原有的输出文件进行删除后,全部重跑。

### OpenCompass 如何分配 GPU?
Expand Down
8 changes: 4 additions & 4 deletions docs/zh_cn/get_started/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,19 +90,19 @@ python run.py \
--debug
```

模型和数据集的配置文件预存于 `configs/models` 和 `configs/datasets` 中。用户可以使用 `tools/list_configs.py` 查看或过滤当前可用的模型和数据集配置。
模型和数据集的配置文件预存于 `configs/models` 和 `configs/datasets` 中。用户可以使用 `opencompass/tools/list_configs.py` 查看或过滤当前可用的模型和数据集配置。

```bash
# 列出所有配置
python tools/list_configs.py
python opencompass/tools/list_configs.py
# 列出与llama和mmlu相关的所有配置
python tools/list_configs.py llama mmlu
python opencompass/tools/list_configs.py llama mmlu
```

:::{dropdown} 关于 `list_configs`
:animate: fade-in-slide-down

运行 `python tools/list_configs.py llama mmlu` 将产生如下输出:
运行 `python opencompass/tools/list_configs.py llama mmlu` 将产生如下输出:

```text
+-----------------+-----------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/notes/news.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@
- **\[2023.08.10\]** OpenCompass 现已适配 [LMDeploy](https://github.com/InternLM/lmdeploy). 请参考 [评测指南](https://opencompass.readthedocs.io/zh_CN/latest/advanced_guides/evaluation_turbomind.html) 对 **Turbomind** 加速后的模型进行评估.
- **\[2023.08.10\]** [Qwen-7B](https://github.com/QwenLM/Qwen-7B) 和 [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B)的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
- **\[2023.08.09\]** 更新更多评测数据集(**CMMLU, TydiQA, SQuAD2.0, DROP**) ,请登录 [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm) 查看更多结果! 欢迎添加你的评测数据集到OpenCompass.
- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果.
- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](opencompass/tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果.
- **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
- **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass.
2 changes: 1 addition & 1 deletion docs/zh_cn/prompt/meta_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,4 +260,4 @@ meta_template=dict(

## 调试

如果需要调试 prompt,建议在准备好配置文件后,使用 `tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多。
如果需要调试 prompt,建议在准备好配置文件后,使用 `opencompass/tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多。
Loading
Loading