Visual instruction tuning towards large language and vision models with GPT-4 level capabilities, enhanced with the Qwen2 base model.
For more details on usage, refer to the original LLaVA repository. This custom repository specifically integrates the Qwen2 base model to leverage its advanced capabilities.
LLaVA Dataset + FinVis Dataset
git lfs install
git clone https://www.modelscope.cn/TobyYang7/llava-qwen2-1.5b-instruct-finvis.git
Download the MMMU dataset first and rename it as MMMU_eval\data
. For more details, you need to follow the official instructions here.
bash eval.sh
LLaVA-Qwen2-1.5B Result
Subject | Data Num | Acc |
---|---|---|
Overall-Art and Design | 120 | 0.35 |
Art | 30 | 0.3 |
Art_Theory | 30 | 0.467 |
Design | 30 | 0.467 |
Music | 30 | 0.167 |
Overall-Business | 150 | 0.22 |
Accounting | 30 | 0.267 |
Economics | 30 | 0.133 |
Finance | 30 | 0.2 |
Manage | 30 | 0.3 |
Marketing | 30 | 0.2 |
Overall-Science | 150 | 0.267 |
Biology | 30 | 0.167 |
Chemistry | 30 | 0.267 |
Geography | 30 | 0.233 |
Math | 30 | 0.333 |
Physics | 30 | 0.333 |
Overall-Health and Medicine | 150 | 0.267 |
Basic_Medical_Science | 30 | 0.233 |
Clinical_Medicine | 30 | 0.333 |
Diagnostics_and_Laboratory_Medicine | 30 | 0.167 |
Pharmacy | 30 | 0.267 |
Public_Health | 30 | 0.333 |
Overall-Humanities and Social Science | 120 | 0.458 |
History | 30 | 0.467 |
Literature | 30 | 0.7 |
Sociology | 30 | 0.4 |
Psychology | 30 | 0.267 |
Overall-Tech and Engineering | 210 | 0.3 |
Agriculture | 30 | 0.367 |
Architecture_and_Engineering | 30 | 0.3 |
Computer_Science | 30 | 0.1 |
Electronics | 30 | 0.2 |
Energy_and_Power | 30 | 0.4 |
Materials | 30 | 0.333 |
Mechanical_Engineering | 30 | 0.4 |
Overall | 900 | 0.303 |
bash pretrain_qwen2.sh
The checkpoint for the pretrain projector is located at checkpoints/Qwen2-1.5B-pretrain-FinVis/mm_projector.bin
bash ft_qwen2.sh
bash run_cli.sh
This repository builds upon the original LLaVA project, integrating the Qwen2 base model for improved performance.
If you are not using Linux, do NOT proceed, see instructions for macOS and Windows.
-
Clone this repository and navigate to the custom LLaVA folder
git clone https://github.com/TobyYang7/Llava_Qwen2.git cd Llava_Qwen2
-
Install Package
conda create -n llava python=3.10 -y conda activate llava pip install --upgrade pip # enable PEP 660 support pip install -e .
-
Install additional packages for training cases
pip install -e ".[train]" pip install flash-attn --no-build-isolation