Reuse Layers without consuming extra memory #4956
LiquidGunay
started this conversation in
Ideas
Replies: 1 comment
-
It can be implemented, here is a comment explaining: #4718 (comment) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
https://www.reddit.com/r/LocalLLaMA/comments/194zwyc/instant_frankenmerges_with_exllamav2/
Can something similar to this thread be implemented in llama.cpp where you don't need to merge a model and load the the whole thing in RAM if you are just reusing layers from the same model. You can just load the layers from the base model once and do multiple forward passes on specific layers.
Beta Was this translation helpful? Give feedback.
All reactions