Releases · ggerganov/llama.cpp

12 Apr 17:16

24ee66e

b2661

server : coherent log output for KV cache full (#6637)

Assets 18

12 Apr 15:08

github-actions

b2660

91c7360

b2660

llama : add gguf_remove_key + remove split meta during quantize (#6591)

* Remove split metadata when quantize model shards

* Find metadata key by enum

* Correct loop range for gguf_remove_key and code format

* Free kv memory

---------

Co-authored-by: z5269887 <[email protected]>

Assets 18

12 Apr 12:13

github-actions

b2658

ef21ce4

b2658

imatrix : remove invalid assert (#6632)

Assets 18

12 Apr 11:46

github-actions

b2657

dee7f8d

b2657

Correct free memory and total memory. (#6630)

Co-authored-by: MasterYi <[email protected]>

Assets 18

12 Apr 11:26

github-actions

b2656

81da18e

b2656

eval-callback: use ggml_op_desc to pretty print unary operator name (…

Assets 18

12 Apr 11:16

github-actions

b2655

9ed2737

b2655

ci : disable Metal for macOS-latest-cmake-x64 (#6628)

Assets 18

10 Apr 17:56

github-actions

b2646

b3a96f2

b2646

minor layout improvements (#6572)

* minor layout improvements

* added missing file, run deps.sh locally

Assets 2

10 Apr 15:46

github-actions

b2645

4f407a0

b2645

llama : add model types for mixtral (#6589)

Assets 2

09 Apr 09:13

github-actions

b2636

5dc9dd7

b2636

llama : add Command R Plus support (#6491)

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <[email protected]>

* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <[email protected]>
Co-authored-by: S <[email protected]>
Co-authored-by: slaren <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 2

08 Apr 14:51

github-actions

b2632

b73e564

b2632

quantize : fix precedence of cli args (#6541)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b2661

b2660

b2658

b2657

b2656

b2655

b2646

b2645

b2636

b2632