Make `RegexGuide` pickleable again for `vllm` and `tgi` #99

joennlae · 2024-11-30T21:01:45Z

I understand that pickleable is not your priority right now. But the RegexGuide needs to be pickled for vllm production use, which is multiprocessing-based.

This PR reintroduces this pickling capability + some tests.

I understand that this introduces more effort on your side.

References:
dottxt-ai/outlines#1274
vllm-project/vllm#10490
vllm-project/vllm#10576
vllm-project/vllm#10489

It would also tackle the current caching issues:
huggingface/text-generation-inference#2766
dottxt-ai/outlines#1283

Closes:
#95

…serialize times

Essentially a cleaned up version of this `pr`: vllm-project#10785 Especially since `outlines` is rather slow and the new version is though to intergrate as they do not focus on being pickleable which is a key feature for us using the multiprocessing engine: dottxt-ai/outlines-core#99 I assume more and more will change over to `xgrammar`. This is a minimum implementation. https://arxiv.org/pdf/2411.15100 Signed-off-by: Jannis Schönleber <[email protected]>

rlouf · 2024-12-02T17:25:46Z

Hey @joennlae, thank you for contributing, and we would like to move forward quickly with this! Is it ready for review?

joennlae · 2024-12-02T17:26:39Z

In my opinion this should be ready :-)

src/python_bindings/mod.rs

torymur

Hi @joennlae 👋

Thank you for adding serialization support, let's handle unwraps and we'll be ready to merge it right away 🚀

joennlae · 2024-12-02T19:01:16Z

Hi @torymur :-)

Thank you for the review. I just wanted to let you know that I updated it accordingly.

rlouf · 2024-12-02T19:46:09Z

I just cut a new release @joennlae. Do you need this in Outlines? (Although you should be able to use outlines-core without importing outlines at this point)

joennlae · 2024-12-02T22:24:18Z

For me, it is all right as I install it directly. But in the long run I think it would be nice :-)

Currently with MQLLMEngine, we are initializing LogitsProcessors on the client side, pickling the entire list of LogitsProcessors, and sending them over ZeroMQ to the engine. This was put in place so that the expensive initialization (tens of second) of the Outlines LogitsProcessor could happen in a thread, such that the client could defer submitting the request to the engine until the initialization had completed. This became an issue because recent (Rust-based) Outlines does not support pickle serialization, but this has resolved by dottxt-ai/outlines-core#99. However, this approach is also not desirable in the case of XGrammar because the initialization is not expensive (hundreds of milliseconds) and the serialization is just unnecessary complexity. And so, let's remove the code from the client side of MQLLMEngine to special case the creation of logits_processors based on guided decoding params. This will now happen on the engine side once again. Signed-off-by: Mark McLoughlin <[email protected]>

joennlae added 4 commits November 30, 2024 20:01

feat(pickle): make Index pickleable by using serde

3427117

test(pickle): add simple + complex pickle test

b889075

feat(pickle): change to bincode for slighly faster serialize and de…

8754128

…serialize times

refactor(pickle): remove timing infra + remove lgos

412ef29

joennlae mentioned this pull request Nov 30, 2024

[Core] Update to outlines >= 0.1.8 vllm-project/vllm#10576

Draft

joennlae mentioned this pull request Dec 1, 2024

[Core] add xgrammar as guided generation provider vllm-project/vllm#10803

Closed

torymur reviewed Dec 2, 2024

View reviewed changes

src/python_bindings/mod.rs Outdated Show resolved Hide resolved

torymur reviewed Dec 2, 2024

View reviewed changes

src/python_bindings/mod.rs Outdated Show resolved Hide resolved

torymur previously approved these changes Dec 2, 2024

View reviewed changes

torymur added bug Something isn't working enhancement New feature or request labels Dec 2, 2024

chore(pickle): handle unwraps with error message

f0c2f3f

joennlae dismissed torymur’s stale review via f0c2f3f December 2, 2024 18:59

torymur approved these changes Dec 2, 2024

View reviewed changes

torymur merged commit d1a0e8c into dottxt-ai:main Dec 2, 2024
7 checks passed

torymur mentioned this pull request Dec 2, 2024

Investigate serialization solution after python bindings stabilization #101

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `RegexGuide` pickleable again for `vllm` and `tgi` #99

Make `RegexGuide` pickleable again for `vllm` and `tgi` #99

joennlae commented Nov 30, 2024

rlouf commented Dec 2, 2024

joennlae commented Dec 2, 2024

torymur left a comment

joennlae commented Dec 2, 2024

rlouf commented Dec 2, 2024

joennlae commented Dec 2, 2024

Make RegexGuide pickleable again for vllm and tgi #99

Make RegexGuide pickleable again for vllm and tgi #99

Conversation

joennlae commented Nov 30, 2024

rlouf commented Dec 2, 2024

joennlae commented Dec 2, 2024

torymur left a comment

Choose a reason for hiding this comment

joennlae commented Dec 2, 2024

rlouf commented Dec 2, 2024

joennlae commented Dec 2, 2024

Make `RegexGuide` pickleable again for `vllm` and `tgi` #99

Make `RegexGuide` pickleable again for `vllm` and `tgi` #99