Support for Llama 70-B #409

AmineDiro · 2023-08-17T14:48:15Z

Hello,
Solves #402 .

This is a temporary fix for supporting the Llama-2 70B model. I wanted to open a draft PR to get your feedbacks on this implementation for supporting the n_gqa params :

Added n_gqa as a optional param in ModelParameters
Added LlamaModelVersion enum akin to other e_model union in llama.cpp
Modified self-attention evaluation to use the n_head_kv for K and V instead of n_head

Here is the llama-2-70B--chat.ggmlv3.q4_0.bin model loaded on A100 GPU :

Expose the sampler configuration structures to allow more flexibility. Add more documentation and description for the sampling functions and structures. Create specific enums for sampler construction and sampling errors. Set n_vocab for the Mirostat 1 sampler in a more reliable way.

This commit removes the pinned version of the `half` crate, allowing consumers to resolve to a version of `half` that is compatible with other dependencies in the project. Signed-off-by: Radu Matei <[email protected]>

LLukas22 · 2023-08-18T08:16:22Z

crates/ggml/sys/build.rs

@@ -332,6 +332,7 @@ fn enable_cublas(build: &mut cc::Build, out_dir: &Path) {
            .arg("static")
            .arg("--generate-code=arch=compute_52,code=[compute_52,sm_52]")
            .arg("--generate-code=arch=compute_61,code=[compute_61,sm_61]")
+            .arg("--generate-code=arch=compute_75,code=[compute_75,sm_75]")


Is this needed for newer cards?

Yes I had issues with my card RTX2070, I guess we should also support 8.x architectures like A100

LLukas22 · 2023-08-18T08:21:13Z

crates/models/llama/src/lib.rs

+enum LlamaModelVersion {
+    Model3b,
+    Model7b,
+    Model13b,
+    Model30b,
+    Model65b,
+    Model70b,
+}


Version hints at the actual llama model version (1 or 2), maybe rename it into LlamaModelType?

That's true, I changed this in [a16caba] (a16caba)

LLukas22 · 2023-08-18T08:23:17Z

Looks good, some small nitpicks but if the CI passes it should be good to go 👍

AmineDiro · 2023-08-18T18:24:26Z

@LLukas22 Thanks for the review 👍🏼 !

LLukas22 · 2023-08-18T19:14:07Z

Thanks for implementing this :D

amine dirhoussi and others added 24 commits August 5, 2023 14:52

fixed RTX2070

b1249c9

Add optional llm_samplers sampler backend

643b6a7

Reimplement vs llm-samplers 0.0.6

5ac346c

fix(deps): remove pinned half dependency

17aa359

This commit removes the pinned version of the `half` crate, allowing consumers to resolve to a version of `half` that is compatible with other dependencies in the project. Signed-off-by: Radu Matei <[email protected]>

Remove repeat from gpt2

3cb5061

remove repeat from gptj

cbe1e51

remove repeat from gptneox

519705b

remove repeat from mpt

43dade0

remove repeat from bloom

5bcbe50

remove repeat from falcon

434325c

Adjust comments

a8398a6

Update llama.cpp

b77388e

cuda acceleration for gptj

fa60f2f

Fixed stack-overflow in debug mode

5c715c6

fix metal build

4aa36f3

Offloading for falcon & gpt2

fc234ee

adjust doc

e179148

Merge branch 'rustformers:main' into main

15520ea

added n_gqa and n_head_kv fields

d8e83e3

Merge remote-tracking branch 'origin/main' into gqa

cc5a98a

added support for sm75

dadd757

gqa in self attention

002ef93

ignored version param in Llama

b129699

AmineDiro marked this pull request as draft August 17, 2023 14:48

AmineDiro added 2 commits August 17, 2023 20:01

fixed read_ggml bug

138404c

split embdding to groups

e532678

AmineDiro marked this pull request as ready for review August 17, 2023 18:36

LLukas22 reviewed Aug 18, 2023

View reviewed changes

renamed Llamamodelversion to llamamodeltype

a16caba

LLukas22 merged commit 2f6ffd4 into rustformers:main Aug 19, 2023
14 checks passed

liamwh mentioned this pull request Oct 28, 2023

Supporting Llama-2 70B param #402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Llama 70-B #409

Support for Llama 70-B #409

AmineDiro commented Aug 17, 2023 •

edited

Loading

LLukas22 Aug 18, 2023

AmineDiro Aug 18, 2023

LLukas22 Aug 18, 2023

AmineDiro Aug 18, 2023

LLukas22 commented Aug 18, 2023

AmineDiro commented Aug 18, 2023

LLukas22 commented Aug 18, 2023

Support for Llama 70-B #409

Support for Llama 70-B #409

Conversation

AmineDiro commented Aug 17, 2023 • edited Loading

LLukas22 Aug 18, 2023

Choose a reason for hiding this comment

AmineDiro Aug 18, 2023

Choose a reason for hiding this comment

LLukas22 Aug 18, 2023

Choose a reason for hiding this comment

AmineDiro Aug 18, 2023

Choose a reason for hiding this comment

LLukas22 commented Aug 18, 2023

AmineDiro commented Aug 18, 2023

LLukas22 commented Aug 18, 2023

AmineDiro commented Aug 17, 2023 •

edited

Loading