vllm-project / llm-compressor Public

Notifications You must be signed in to change notification settings
Fork 60
Star 727

Code
Issues 32
Pull requests 27
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Issues: vllm-project/llm-compressor

MODEL REQUESTS

#69 opened Aug 8, 2024 by robertgshaw2-neuralmagic

Open 63

Q3 ROADMAP

#30 opened Jul 22, 2024 by robertgshaw2-neuralmagic

Open 4

Labels 11 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32 Open 61 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

CUDA OOM while saving compressed Llama-3.1-70b with AutoModelForCausalLM bug

Something isn't working

#928 opened Nov 20, 2024 by hibukipanim

Got Error when I load a 2of4 model using vllm. bug

Something isn't working

#926 opened Nov 19, 2024 by jiangjiadi

Encounter error "No modifier of type 'SparseGPTModifier' found" when upgrading to 0.3.0 bug

Something isn't working

#925 opened Nov 19, 2024 by jiangjiadi

Discuss the use of hyperparameters in the quantization_w8a8_int8 script documentation

Improvements or additions to documentation

#916 opened Nov 14, 2024 by HelloCard

Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs bug

Something isn't working

#911 opened Nov 13, 2024 by arunpatala

W8A8 quant for GPT-J failed bug

Something isn't working

#909 opened Nov 12, 2024 by zhouyuan

Can we quantize Qwen/Qwen2.5-Math-RM-72B?

#900 opened Nov 7, 2024 by yechenzhi

Only one GPU shows usage while quantizing

#890 opened Nov 3, 2024 by gmonair

[Usage] How to manually set calibration_function?

#886 opened Nov 1, 2024 by donpromax

OOM, deepseek v2 code lite on A40 gpus bug

Something isn't working

#885 opened Nov 1, 2024 by tohnee

Any plan about W4A8 enhancement

New feature or request

#873 opened Oct 29, 2024 by Arcmoon-Hu

Model saving fails on AWS instances with OOM kill bug

Something isn't working

#868 opened Oct 25, 2024 by Arseny-N

Output of Compressor unable to be to be loaded by latest HF Transformers bug

Something isn't working

#865 opened Oct 23, 2024 by hyaticua

Does llm-compressor support minicpm3 which is MLA architecture？ enhancement

New feature or request

#860 opened Oct 22, 2024 by piamo

Is it possible to quantize to FP8 W8A16 without calibration data enhancement

New feature or request

#858 opened Oct 21, 2024 by us58

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue bug

Something isn't working

#853 opened Oct 19, 2024 by HengJayWang

Why is the speed does not increase after compressed it? bug

Something isn't working

#852 opened Oct 18, 2024 by liho00

[Question]Does Minicpmv2.6 currently support int8/fp8 quantization?

#848 opened Oct 15, 2024 by wjj19950828

AttributeError: 'CompressedLinear' object has no attribute 'weight' bug

Something isn't working

#835 opened Oct 9, 2024 by kylesayrs

When to support multi-nodes quantization? enhancement

New feature or request

#831 opened Oct 9, 2024 by IEI-mjx

No model size reduction seen

#790 opened Oct 5, 2024 by sriyachakravarthy

AttributeError: 'MllamaConfig' object has no attribute 'use_cache' bug

Something isn't working

#688 opened Sep 26, 2024 by mgoin

SmoothQuant doesn't respect ignored modules for VLMs bug

Something isn't working

#687 opened Sep 26, 2024 by mgoin

KV Cache Quantization example cause problem bug

Something isn't working

#660 opened Sep 25, 2024 by weicheng59

[USAGE] FP8 W8A8 (+KV) with LORA Adapters enhancement

New feature or request

#164 opened Sep 11, 2024 by paulliwog

Previous 1 2 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly