Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1 #10

naajeehxe · 2024-11-01T03:44:33Z

Hello, Thanks for your wonderful research.

I understand that the number of tokens being pruned depends on the value of lambda multiplied by the rank, while the number of recycled tokens is influenced by the hyper parameter tau .

But the table1 of this paper shows that the number of visual tokens is fixed at 192, 128, and 64.

Could you please clarify whether these token counts were hardcoded to select exactly 192, 128, or 64 visual tokens, or if there was another approach to maintaining a fixed token count for these experiments?

Thank you, Sincerely

Gumpest · 2024-11-10T04:42:32Z

I appreciate your interest in our work.

The retained tokens are controlled by changing the scaling factor in Formula 8.
This is the equivalent number of tokens. For example, T = {(L1-L0) * T0 + (L2-L1) * T1} / L2.
Therefore, we select exactly 192, 128, or 64 visual tokens to compare with other methods fairly.

YUECHE77 · 2024-11-16T20:41:36Z

Hi,

I'm also very curious about the configuration in tabel 1. Could you please us an example?

For instance, when you retain exactly 128 tokens, what is your "--scale" and "--bias"? Are they 9 and 6? Please correct me if I'm wrong.

Also, I'm kind of confused about the meaning of "scale" and "bias". Does "scale" stand for the lambda in Formula 8? And what is "bias" exactly?

And by the way, could you please share the configuration for FastV in table 1? I'm trying to reproduce your results. What k (layer) and r (attention rank) were you using, for example "Retain 128 Tokens" as you mentioned in table 1?

Really like your work, it's amazing! And hope I can get your respond.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1 #10

Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1 #10

naajeehxe commented Nov 1, 2024

Gumpest commented Nov 10, 2024

YUECHE77 commented Nov 16, 2024

Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1 #10

Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1 #10

Comments

naajeehxe commented Nov 1, 2024

Gumpest commented Nov 10, 2024

YUECHE77 commented Nov 16, 2024