Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Version] Bump version to 0.2.76 #637

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

CharlieFRuan
Copy link
Contributor

Change

  • [Grammar] Integrate with XGrammar #635
    • Integrate with web-xgrammar
    • Support ResponseFormat.type == "grammar", where you specify an EBNF grammar string
    • Add grammar_init_ms and grammar_per_token_ms to CompletionUsage.extra when using grammar
    • Add time_to_first_token_s (TTFT) and time_per_output_token_s (TPOT), e2e_latency_s to CompletionUsage.extra
    • Add ignore_eos to Completion and ChatCompletion requests
  • Fix VRAM required by Qwen2.5-Coder-1.5B-Instruct model #632
    • Fixes of vram requirement for Qwen2.5-Coder-1.5B-Instruct model

TVMjs

  • No change, version 0.18.0-dev2 just like 0.2.71

@CharlieFRuan CharlieFRuan merged commit 082f04e into mlc-ai:main Nov 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant