Question regarding the training of the llama2 version #337
-
Thank you for your work! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hello! The training data and training strategy is exactly the same as the vicuna version. One different thing is, in the old vicuna version, we use the blip-2's q-former. In the llama2 version, we remove it. The linear layer now directly map the output of clip's vision encoder to LLM's input. |
Beta Was this translation helpful? Give feedback.
-
BTW, may I ask what are these enhancements you found in the LLama2 version? |
Beta Was this translation helpful? Give feedback.
Hello! The training data and training strategy is exactly the same as the vicuna version. One different thing is, in the old vicuna version, we use the blip-2's q-former. In the llama2 version, we remove it. The linear layer now directly map the output of clip's vision encoder to LLM's input.