You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.
My apologies if this is a really stupid question... but
Is there scope here to provide the ability to load 4bit models? such as vicuna-13B-1.1-GPTQ-4bit-128g or even 4bit 30B llama models will squeeze into 24GB VRAM. I know this can all be done in other web-ui projects, but having an OpenAI like API such as this project would be amazing.
The text was updated successfully, but these errors were encountered:
My apologies if this is a really stupid question... but
Is there scope here to provide the ability to load 4bit models? such as vicuna-13B-1.1-GPTQ-4bit-128g or even 4bit 30B llama models will squeeze into 24GB VRAM. I know this can all be done in other web-ui projects, but having an OpenAI like API such as this project would be amazing.
The text was updated successfully, but these errors were encountered: