[Version] Bump version to 0.2.76 #637

CharlieFRuan · 2024-11-22T15:54:58Z

[Grammar] Integrate with XGrammar #635
- Integrate with web-xgrammar
- Support ResponseFormat.type == "grammar", where you specify an EBNF grammar string
- Add grammar_init_ms and grammar_per_token_ms to CompletionUsage.extra when using grammar
- Add time_to_first_token_s (TTFT) and time_per_output_token_s (TPOT), e2e_latency_s to CompletionUsage.extra
- Add ignore_eos to Completion and ChatCompletion requests
Fix VRAM required by Qwen2.5-Coder-1.5B-Instruct model #632
- Fixes of vram requirement for Qwen2.5-Coder-1.5B-Instruct model

[Version] Bump version to 0.2.76

473c113

CharlieFRuan merged commit 082f04e into mlc-ai:main Nov 22, 2024
1 check passed

Provide feedback