Before preparing task-specific data, you MUST first download eval.zip. It contains custom annotations, scripts, and the prediction files with LLaVA v1.5. Extract to ./playground/data/eval
. This also provides a general structure for all datasets. For more details,please refer to doc
- Download
TextVQA_0.5.1_val.json
and images and extract to./playground/data/eval/textvqa
. - Single-GPU inference and evaluate.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh
- Download the data following the official instructions here.
- Downloaded images to
MME_Benchmark_release_version
. - put the official
eval_tool
andMME_Benchmark_release_version
under./playground/data/eval/MME
. - Single-GPU inference and evaluate.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh
- Download
mmbench_dev_20230712.tsv
and put under./playground/data/eval/mmbench
. - Single-GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench.sh
- Submit the results to the evaluation server:
./playground/data/eval/mmbench/answers_upload/mmbench_dev_20230712
.
- Download
mmbench_dev_cn_20231003.tsv
and put under./playground/data/eval/mmbench
. - Single-GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench_cn.sh
- Submit the results to the evaluation server:
./playground/data/eval/mmbench/answers_upload/mmbench_dev_cn_20231003
.
- Extract
mm-vet.zip
to./playground/data/eval/mmvet
. - Single-GPU inference.
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmvet.sh
- Evaluate the predictions in
./playground/data/eval/mmvet/results
using the official jupyter notebook.