Skip to content

Question regarding the training of the llama2 version #337

Answered by TsuTikgiau
JunZhan2000 asked this question in Q&A
Discussion options

You must be logged in to vote

Hello! The training data and training strategy is exactly the same as the vicuna version. One different thing is, in the old vicuna version, we use the blip-2's q-former. In the llama2 version, we remove it. The linear layer now directly map the output of clip's vision encoder to LLM's input.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by JunZhan2000
Comment options

You must be logged in to vote
2 replies
@JunZhan2000
Comment options

@Shivtej8446
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants