Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop GPTQ support #889

Merged
merged 2 commits into from
Jan 18, 2024
Merged

Drop GPTQ support #889

merged 2 commits into from
Jan 18, 2024

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Jan 18, 2024

Closes #582
Closes #583

GPTQ is inference only, requires a conversion step, and the implementation we use is much slower than bitsandbytes. The only upside is that it uses less memory at inference time.

There's a lot of research happening around inference quantization and having this implementation in the repo is not worth it anymore.

@carmocca carmocca self-assigned this Jan 18, 2024
@Andrei-Aksionov
Copy link
Collaborator

Sorry for procrastinating with my AutoGPTQ integration, but I've started working on it (again) and there should be something to look at somewhere next week.
So the question is whether you are dropping GPTQ (and its variants) or just the current implementation?
If just the current implementation, then this PR doesn't close #583.

@carmocca
Copy link
Contributor Author

carmocca commented Jan 18, 2024

The current implementation, however, as long as any new additions are not as useful and usable as existing implementations (for now it's only bnb) then we wouldn't be interested in adding them.

This PR should close #583. I suggest opening new issues proposing the addition of new techniques.

For instance https://github.com/IST-DASLab/marlin was released yesterday and includes its own GPTQ implementation. Perhaps AutoGPTQ is no longer the best alternative.

edit: they'll be adding marlin support with AutoGPTQ/AutoGPTQ#514

@carmocca carmocca merged commit 49c7e07 into main Jan 18, 2024
9 checks passed
@carmocca carmocca deleted the carmocca/drop-gptq-support branch January 18, 2024 23:00
@Andrei-Aksionov
Copy link
Collaborator

Great!
Then I'll continue with AutoGPTQ.

rasbt pushed a commit that referenced this pull request Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace our GPTQ implementation with something better gptq quantization fails with torch 2.2
2 participants