Skip to content

Releases: Xarbirus/llama.cpp

b4240

02 Dec 22:45
64ed209
Compare
Choose a tag to compare
server: Add "tokens per second" information in the backend (#10548)

* add cmake rvv support

* add timings

* remove space

* update readme

* fix

* fix code

* remove empty line

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b4061

09 Nov 17:33
6423c65
Compare
Choose a tag to compare
metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

b3969

23 Oct 20:35
190a37d
Compare
Choose a tag to compare
sync : ggml

b3917

14 Oct 14:22
a89f75e
Compare
Choose a tag to compare
server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

b3810

23 Sep 19:53
1d48e98
Compare
Choose a tag to compare
readme : add programmable prompt engine language CLI (#9599)

b3767

16 Sep 08:44
5c3d0f1
Compare
Choose a tag to compare
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)

* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before

b3758

15 Sep 16:09
3c7989f
Compare
Choose a tag to compare
py : add "LLaMAForCausalLM" conversion support (#9485)

Co-authored-by: Csaba Kecskemeti <[email protected]>

b3755

14 Sep 20:11
822b632
Compare
Choose a tag to compare
ggml : ggml_type_name return "NONE" for invalid values (#9458)

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

b3736

12 Sep 10:58
c9c8575
Compare
Choose a tag to compare
enhance run script to be easy to change the parameters (#9448)

Co-authored-by: arthw <[email protected]>

b3676

06 Sep 19:39
815b1fb
Compare
Choose a tag to compare
batched-bench : add `--output-format jsonl` option (#9293)

`--output-format` is modeled after `llama-bench`'s options