Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VRAM required by Qwen2.5-Coder-1.5B-Instruct model #632

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

felladrin
Copy link
Contributor

@felladrin felladrin commented Nov 20, 2024

Currently, it has the same VRAM values as the Qwen2.5-Coder-7B-Instruct model.

This change fixes it using the same values from the Qwen2.5-1.5B-Instruct model, as shown in the screenshot below:

image

Currently, it has the same VRAM values as the `Qwen2.5-Coder-7B-Instruct` model.

This change fixes it using the same values from the `Qwen2.5-1.5B-Instruct` model.
Copy link
Contributor

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the catch and the fix!

@CharlieFRuan CharlieFRuan merged commit 6504047 into mlc-ai:main Nov 22, 2024
1 check passed
@felladrin felladrin deleted the patch-1 branch November 22, 2024 12:09
CharlieFRuan added a commit that referenced this pull request Nov 22, 2024
### Change

- #635
  - Integrate with `web-xgrammar`
- Support `ResponseFormat.type == "grammar"`, where you specify an EBNF
grammar string
- Add `grammar_init_ms` and `grammar_per_token_ms` to
`CompletionUsage.extra` when using grammar
- Add `time_to_first_token_s` (TTFT) and `time_per_output_token_s`
(TPOT), `e2e_latency_s` to `CompletionUsage.extra`
  - Add `ignore_eos` to `Completion` and `ChatCompletion` requests
- #632
  - Fixes of vram requirement for Qwen2.5-Coder-1.5B-Instruct model

### TVMjs
- No change, version `0.18.0-dev2` just like 0.2.71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants