You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This work is fantastic and has been very inspiring to me.
I have a questions: LLaVA-Next-Qwen2 supports a context length of 128K, roughly supporting 888 frames. However, Table 4 in the paper only shows results for 32 frames (AnyRes). Have you conducted more experiments using LLaVA-Next-Qwen2+UniRes on more frames, such as 256? This might provide a fairer comparison.
The text was updated successfully, but these errors were encountered:
This work is fantastic and has been very inspiring to me.
I have a questions: LLaVA-Next-Qwen2 supports a context length of 128K, roughly supporting 888 frames. However, Table 4 in the paper only shows results for 32 frames (AnyRes). Have you conducted more experiments using LLaVA-Next-Qwen2+UniRes on more frames, such as 256? This might provide a fairer comparison.
The text was updated successfully, but these errors were encountered: