-
Notifications
You must be signed in to change notification settings - Fork 363
Conversation
Expose the sampler configuration structures to allow more flexibility. Add more documentation and description for the sampling functions and structures. Create specific enums for sampler construction and sampling errors. Set n_vocab for the Mirostat 1 sampler in a more reliable way.
This commit removes the pinned version of the `half` crate, allowing consumers to resolve to a version of `half` that is compatible with other dependencies in the project. Signed-off-by: Radu Matei <[email protected]>
@@ -332,6 +332,7 @@ fn enable_cublas(build: &mut cc::Build, out_dir: &Path) { | |||
.arg("static") | |||
.arg("--generate-code=arch=compute_52,code=[compute_52,sm_52]") | |||
.arg("--generate-code=arch=compute_61,code=[compute_61,sm_61]") | |||
.arg("--generate-code=arch=compute_75,code=[compute_75,sm_75]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for newer cards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I had issues with my card RTX2070, I guess we should also support 8.x architectures like A100
crates/models/llama/src/lib.rs
Outdated
enum LlamaModelVersion { | ||
Model3b, | ||
Model7b, | ||
Model13b, | ||
Model30b, | ||
Model65b, | ||
Model70b, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version hints at the actual llama model version (1 or 2), maybe rename it into LlamaModelType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some small nitpicks but if the CI passes it should be good to go 👍 |
@LLukas22 Thanks for the review 👍🏼 ! |
Thanks for implementing this :D |
Hello,
Solves #402 .
This is a temporary fix for supporting the Llama-2 70B model. I wanted to open a draft PR to get your feedbacks on this implementation for supporting the
n_gqa
params :n_gqa
as a optional param inModelParameters
LlamaModelVersion
enum akin to other e_model union in llama.cppn_head_kv
forK
andV
instead ofn_head
Here is the
llama-2-70B--chat.ggmlv3.q4_0.bin
model loaded on A100 GPU :