Skip to content

Latest commit

 

History

History
120 lines (91 loc) · 4.62 KB

README.md

File metadata and controls

120 lines (91 loc) · 4.62 KB

LLaVA-Qwen2: Enhanced with Qwen2 Base Model

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities, enhanced with the Qwen2 base model.

For more details on usage, refer to the original LLaVA repository. This custom repository specifically integrates the Qwen2 base model to leverage its advanced capabilities.

Dataset for Pretraining and Finetuning

LLaVA Dataset + FinVis Dataset

Download

git lfs install
git clone https://www.modelscope.cn/TobyYang7/llava-qwen2-1.5b-instruct-finvis.git

MMMU Eval

Download the MMMU dataset first and rename it as MMMU_eval\data. For more details, you need to follow the official instructions here.

bash eval.sh
LLaVA-Qwen2-1.5B Result
Subject Data Num Acc
Overall-Art and Design 120 0.35
Art 30 0.3
Art_Theory 30 0.467
Design 30 0.467
Music 30 0.167
Overall-Business 150 0.22
Accounting 30 0.267
Economics 30 0.133
Finance 30 0.2
Manage 30 0.3
Marketing 30 0.2
Overall-Science 150 0.267
Biology 30 0.167
Chemistry 30 0.267
Geography 30 0.233
Math 30 0.333
Physics 30 0.333
Overall-Health and Medicine 150 0.267
Basic_Medical_Science 30 0.233
Clinical_Medicine 30 0.333
Diagnostics_and_Laboratory_Medicine 30 0.167
Pharmacy 30 0.267
Public_Health 30 0.333
Overall-Humanities and Social Science 120 0.458
History 30 0.467
Literature 30 0.7
Sociology 30 0.4
Psychology 30 0.267
Overall-Tech and Engineering 210 0.3
Agriculture 30 0.367
Architecture_and_Engineering 30 0.3
Computer_Science 30 0.1
Electronics 30 0.2
Energy_and_Power 30 0.4
Materials 30 0.333
Mechanical_Engineering 30 0.4
Overall 900 0.303

Pretrain Qwen2

bash pretrain_qwen2.sh

The checkpoint for the pretrain projector is located at checkpoints/Qwen2-1.5B-pretrain-FinVis/mm_projector.bin

Finetune Qwen2

bash ft_qwen2.sh

Interface

bash run_cli.sh

alt text

Installation

This repository builds upon the original LLaVA project, integrating the Qwen2 base model for improved performance.

If you are not using Linux, do NOT proceed, see instructions for macOS and Windows.

  1. Clone this repository and navigate to the custom LLaVA folder

    git clone https://github.com/TobyYang7/Llava_Qwen2.git
    cd Llava_Qwen2
  2. Install Package

    conda create -n llava python=3.10 -y
    conda activate llava
    pip install --upgrade pip  # enable PEP 660 support
    pip install -e .
  3. Install additional packages for training cases

    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation