We provide the processed data as follows.
Datasets | Hugging Face | Baidu Disk |
---|---|---|
Multimodal Pre-training | Link | - |
Joint Instruction Tuning | Link | - |
ScienceQA | Link | - |
We provide the processed data as follows. The annotations are provided in eval/questions.
Datasets | Hugging Face | Baidu Disk | Google Disk | Peking University Disk |
---|---|---|---|---|
Image_Understanding | Link | - | - | - |
Video_Understanding | Link | - | - | - |
ScienceQA | Link | - | - | - |
Activitynet_Zero_Shot_QA | Link | Link | - | - |
MSRVTT_Zero_Shot_QA | Link | Link | Link | - |
MSVD_Zero_Shot_QA | Link | Link | Link | Link |
TGIF_Zero_Shot_QA | Link | Link | Link | Link |
POPE | Link | - | - | - |
Modify the data path in config/dataset_config.py:
Pretrain = {
"chat_path": "${PATH}/CC3M-595K/chat.json",
"CC3M": "${PATH}/CC3M-595K",
}
VIT = {
"chat_path": "${PATH}/llava_instruct_150k.json",
"COCO2017": "${PATH}/COCO2017/train2017",
}
MIMIC_imageonly = {
"chat_path": "${PATH}/MIMIC-IT-imageonly.json",
"CDG": "${PATH}/CGD/images",
"LA": "${PATH}/LA/images",
"SD": "${PATH}/SD/images",
}
COCO_CAP = {
"chat_path": "${PATH}/COCO/coco_cap_chat.json",
"COCO2014": "${PATH}/COCO2014/train2014",
}
COCO_REG = {
"chat_path": "${PATH}/COCO/coco_reg_chat.json",
"COCO2014": "${PATH}/COCO2014/train2014",
}
COCO_REC = {
"chat_path": "${PATH}/COCO/coco_rec_chat.json",
"COCO2014": "${PATH}/COCO2014/train2014",
}
VIDEO = {
"chat_path": "${PATH}/video_chat.json",
"VIDEO": "${PATH}/Activity_Videos",
}
SQA = {
"chat_path": "${PATH}/llava_train_QCM-LEA.json",
"ScienceQA": "${PATH}/scienceqa/train",
}
All the conversation data is in JSON format, and each conversation has the following content:
- id: Used to distinguish different samples
- "image" or "video": The name of the image or video
- conversations: Conversations data
[
{
"id": "COCO_CAP_0",
"image": "COCO_train2014_000000222016.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nDescribe the main events or objects in the image."
},
{
"from": "gpt",
"value": "a big red telephone booth that a man is standing in"
}
]
},
]
Modify the data path in config/dataset_config.py:
New_data = {
"chat_path": "${PATH}/chat.json",
"new_data": "${PATH}/file",
}
Then, modify the config in config/init.py. You can also combine different datasets using the list format:
DataConfig = {
"New": [New_data],
}
To use the new dataset when training, you only need to change the parameters of "dataset_use" in the command:
--dataset_use New