Skip to content

Latest commit

 

History

History
50 lines (37 loc) · 2.4 KB

Eval.md

File metadata and controls

50 lines (37 loc) · 2.4 KB

Scripts

Before preparing task-specific data, you MUST first download eval.zip. It contains custom annotations, scripts, and the prediction files with LLaVA v1.5. Extract to ./playground/data/eval. This also provides a general structure for all datasets. For more details,please refer to doc

TextVQA

  1. Download TextVQA_0.5.1_val.json and images and extract to ./playground/data/eval/textvqa.
  2. Single-GPU inference and evaluate.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh

MME

  1. Download the data following the official instructions here.
  2. Downloaded images to MME_Benchmark_release_version.
  3. put the official eval_tool and MME_Benchmark_release_version under ./playground/data/eval/MME.
  4. Single-GPU inference and evaluate.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh

MMBench

  1. Download mmbench_dev_20230712.tsv and put under ./playground/data/eval/mmbench.
  2. Single-GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench.sh
  1. Submit the results to the evaluation server: ./playground/data/eval/mmbench/answers_upload/mmbench_dev_20230712.

MMBench-CN

  1. Download mmbench_dev_cn_20231003.tsv and put under ./playground/data/eval/mmbench.
  2. Single-GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench_cn.sh
  1. Submit the results to the evaluation server: ./playground/data/eval/mmbench/answers_upload/mmbench_dev_cn_20231003.

MM-Vet

  1. Extract mm-vet.zip to ./playground/data/eval/mmvet.
  2. Single-GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmvet.sh
  1. Evaluate the predictions in ./playground/data/eval/mmvet/results using the official jupyter notebook.