You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel. The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model trained with the Liger kernel showed lower performance on MLLM benchmarks, such as ChartQA. Since I used LLaMA 3, which is supported by Liger, I didn't expect any issues. Has anyone else tried training LLaVA with the Liger kernel?
Reproduce
from liger_kernel.transformers import apply_liger_kernel_to_llama
print("Apply liger_kernel_to_llama")
apply_liger_kernel_to_llama()
model = LlavaLlamaForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B",
attn_implementation="flash_attention_2",
torch_dtype=(torch.bfloat16),
)
Versions
transformer = 4.45.1
torch = 2.4.0
a100
The text was updated successfully, but these errors were encountered:
I used xtuner to train llava, and there was no decrease in performance. I find this feature very useful and highly recommend it!
The training time remains almost unchanged, and the GPU memory usage is reduced by about 20%. If the sequence length is increased or a smaller model is used, the memory usage can be reduced by up to 50%.
🐛 Describe the bug
I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel. The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model trained with the Liger kernel showed lower performance on MLLM benchmarks, such as ChartQA. Since I used LLaMA 3, which is supported by Liger, I didn't expect any issues. Has anyone else tried training LLaVA with the Liger kernel?
Reproduce
Versions
transformer = 4.45.1
torch = 2.4.0
a100
The text was updated successfully, but these errors were encountered: