How to run longvila large context, sequence parallel inference? #130

zadeismael · 2024-08-27T18:42:51Z

There are multiple mentions of a multi modal sequence parallel system for inference which can be seamlessly integrated with HF transformers. However, I am not able to follow this through the codebase OR see this exhibited in any of the scripts / examples.

Can the team please:

Point me to the code that enables long context, sequence parallel inference for generation?
Provide an examples script to run this inference (preferably the same script used for the eval metrics mentioned in the paper?

Mentions of inference in the longvila paper:
Section 1:
For inference, the memory usage of KV cache will also be a bottleneck
when the sequence length is very long, we thus implement the inference mode of our MM-SP to
support long context multi-modal language deployment.

Section 3.3
Thus, we implement sequence parallelism for VLMs distributed inference. Compared to the training mode, the
system needs to additionally maintain tensors (e.g. input tokens and position encodings) that are progressively
changing during the decoding phrase (Yu et al., 2022). In
addition, the system needs to detect signals from the machine that holds the last token and accordingly terminate
the distributed process.

Section 5(.1)

Lyken17 · 2024-08-28T04:23:16Z

@DachengLi1 @yukang2017

DachengLi1 · 2024-08-28T04:27:47Z

Hi @zadeismael Thank you for the notice! This is an active PR that will be merged very soon (within days).

hb-jw · 2024-09-02T10:19:17Z

Hello, I am also very interested in sequence parallel inference. May I ask when you plan to open-source the code for sequence parallel inference?

DachengLi1 · 2024-09-02T17:16:57Z

@hb-jw Thank you! We are undergoing the final merging check in our internal codebase for this PR, and will be ready very soon (If everything goes well, it should be mid this week).

hb-jw · 2024-09-06T02:58:00Z

Hello,today is Friday,I want to ask if everything goes well?

DachengLi1 · 2024-09-09T06:02:58Z

@hb-jw Hi there, sorry for the delay. We have worked out the version update. We are working on integrating with the vision needle-in-a-haystack before OSS this PR.

zade-twelvelabs · 2024-09-09T08:04:30Z

@DachengLi1 Thanks for the update - can you let us know a new expected date?

DachengLi1 · 2024-09-10T08:18:06Z

@zade-twelvelabs I will allocate more bandwidth to the task, and hopefully finish it by this Thursday. Thanks for your patience, and apologize the delay!

hb-jw · 2024-09-10T13:36:10Z

@zade-twelvelabs I will allocate more bandwidth to the task, and hopefully finish it by this Thursday. Thanks for your patience, and apologize the delay!

OK！Thank you for your effort and open-source, I like the project about sequence parallel very much and check if it is open-source every day,please reply me when it open-sourced! Thank you again!

zade-twelvelabs · 2024-09-12T00:57:07Z

@DachengLi1 Echoing @hb-jw 's comment - thanks for the prioritization :)

hb-jw · 2024-09-12T15:25:00Z

Thank you for your amazing work! It's already Thursday, and I've been looking forward to it for a long time. Could you please tell me when the sequence parallel code will be open-sourced?

DachengLi1 · 2024-09-13T08:00:51Z

Hi @hb-jw Sorry we have an internal regression that leads to small accuracy mis-match. If you are finding a quick solution, we have an implementation here: https://github.com/NVlabs/VILA/tree/main/llava/eval/vision_niah_vila.

zade-twelvelabs · 2024-09-13T08:25:56Z

This is a non generative example though, right? Can this be used for next token generation?

zade-twelvelabs · 2024-09-20T01:05:43Z

Hi @DachengLi1 :)

hb-jw · 2024-09-28T12:04:19Z

Hello,It has been a long time, but it feels as if only a short time has passed because I have been waiting for parallel inference in sequences. Do you still have plans to open-source the parallel inference code for sequences? If so, when exactly will it be? I am eagerly looking forward to it, waiting with bated breath, and eagerly await and thank you for your reply.

DachengLi1 · 2024-09-28T17:34:57Z

Hi @hb-jw Sincerely apologize for this. We undergo a significant refactoring in the internal repo in support of more models, and this PR could hardly be merged. We are working on a deadline at Oct 2, and I will rewrite the PR accordingly, hopefully within one week of Oct 2.

zade-twelvelabs · 2024-10-11T19:30:53Z

Hi - updates?

liuyijiang1994 · 2024-10-16T07:02:20Z

+1

hhaAndroid · 2024-10-30T06:27:20Z

+1

hb-jw · 2024-11-09T11:04:11Z

+1

zadeismael changed the title ~~How to run longvila large input sequence inference?~~ How to run longvila large context, sequence parallel inference? Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run longvila large context, sequence parallel inference? #130

How to run longvila large context, sequence parallel inference? #130

zadeismael commented Aug 27, 2024

Lyken17 commented Aug 28, 2024

DachengLi1 commented Aug 28, 2024

hb-jw commented Sep 2, 2024

DachengLi1 commented Sep 2, 2024

hb-jw commented Sep 6, 2024

DachengLi1 commented Sep 9, 2024

zade-twelvelabs commented Sep 9, 2024

DachengLi1 commented Sep 10, 2024

hb-jw commented Sep 10, 2024

zade-twelvelabs commented Sep 12, 2024

hb-jw commented Sep 12, 2024

DachengLi1 commented Sep 13, 2024

zade-twelvelabs commented Sep 13, 2024 •

edited

Loading

zade-twelvelabs commented Sep 20, 2024

hb-jw commented Sep 28, 2024

DachengLi1 commented Sep 28, 2024

zade-twelvelabs commented Oct 11, 2024

liuyijiang1994 commented Oct 16, 2024

hhaAndroid commented Oct 30, 2024

hb-jw commented Nov 9, 2024

How to run longvila large context, sequence parallel inference? #130

How to run longvila large context, sequence parallel inference? #130

Comments

zadeismael commented Aug 27, 2024

Lyken17 commented Aug 28, 2024

DachengLi1 commented Aug 28, 2024

hb-jw commented Sep 2, 2024

DachengLi1 commented Sep 2, 2024

hb-jw commented Sep 6, 2024

DachengLi1 commented Sep 9, 2024

zade-twelvelabs commented Sep 9, 2024

DachengLi1 commented Sep 10, 2024

hb-jw commented Sep 10, 2024

zade-twelvelabs commented Sep 12, 2024

hb-jw commented Sep 12, 2024

DachengLi1 commented Sep 13, 2024

zade-twelvelabs commented Sep 13, 2024 • edited Loading

zade-twelvelabs commented Sep 20, 2024

hb-jw commented Sep 28, 2024

DachengLi1 commented Sep 28, 2024

zade-twelvelabs commented Oct 11, 2024

liuyijiang1994 commented Oct 16, 2024

hhaAndroid commented Oct 30, 2024

hb-jw commented Nov 9, 2024

zade-twelvelabs commented Sep 13, 2024 •

edited

Loading