Skip to content

Releases: ggerganov/llama.cpp

b2661

12 Apr 17:16
24ee66e
Compare
Choose a tag to compare
server : coherent log output for KV cache full (#6637)

b2660

12 Apr 15:08
91c7360
Compare
Choose a tag to compare
llama : add gguf_remove_key + remove split meta during quantize (#6591)

* Remove split metadata when quantize model shards

* Find metadata key by enum

* Correct loop range for gguf_remove_key and code format

* Free kv memory

---------

Co-authored-by: z5269887 <[email protected]>

b2658

12 Apr 12:13
ef21ce4
Compare
Choose a tag to compare
imatrix : remove invalid assert (#6632)

b2657

12 Apr 11:46
dee7f8d
Compare
Choose a tag to compare
Correct free memory and total memory. (#6630)

Co-authored-by: MasterYi <[email protected]>

b2656

12 Apr 11:26
81da18e
Compare
Choose a tag to compare
eval-callback: use ggml_op_desc to pretty print unary operator name (…

b2655

12 Apr 11:16
9ed2737
Compare
Choose a tag to compare
ci : disable Metal for macOS-latest-cmake-x64 (#6628)

b2646

10 Apr 17:56
b3a96f2
Compare
Choose a tag to compare
minor layout improvements (#6572)

* minor layout improvements

* added missing file, run deps.sh locally

b2645

10 Apr 15:46
4f407a0
Compare
Choose a tag to compare
llama : add model types for mixtral (#6589)

b2636

09 Apr 09:13
5dc9dd7
Compare
Choose a tag to compare
llama : add Command R Plus support (#6491)

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <[email protected]>

* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <[email protected]>
Co-authored-by: S <[email protected]>
Co-authored-by: slaren <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b2632

08 Apr 14:51
b73e564
Compare
Choose a tag to compare
quantize : fix precedence of cli args (#6541)