Releases · Xarbirus/llama.cpp

02 Dec 22:45

64ed209

b4240 Latest

Latest

server: Add "tokens per second" information in the backend (#10548)

* add cmake rvv support

* add timings

* remove space

* update readme

* fix

* fix code

* remove empty line

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 22

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-02T22:45:26Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-02T22:45:32Z
llama-b4240-bin-macos-arm64.zip

52.1 MB 2024-12-02T22:45:41Z
llama-b4240-bin-macos-x64.zip

53.8 MB 2024-12-02T22:45:42Z
llama-b4240-bin-ubuntu-x64.zip

59 MB 2024-12-02T22:45:44Z
llama-b4240-bin-win-avx-x64.zip

8.52 MB 2024-12-02T22:45:45Z
llama-b4240-bin-win-avx2-x64.zip

8.53 MB 2024-12-02T22:45:46Z
llama-b4240-bin-win-avx512-x64.zip

8.54 MB 2024-12-02T22:45:47Z
llama-b4240-bin-win-cuda-cu11.7-x64.zip

146 MB 2024-12-02T22:45:47Z
llama-b4240-bin-win-cuda-cu12.4-x64.zip

146 MB 2024-12-02T22:45:51Z
Source code (zip)

2024-12-02T13:45:54Z
Source code (tar.gz)

2024-12-02T13:45:54Z

09 Nov 17:33

github-actions

b4061

6423c65

b4061

metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

Assets 22

23 Oct 20:35

github-actions

b3969

190a37d

b3969

sync : ggml

Assets 22

14 Oct 14:22

github-actions

b3917

a89f75e

b3917

server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

Assets 22

23 Sep 19:53

github-actions

b3810

1d48e98

b3810

readme : add programmable prompt engine language CLI (#9599)

Assets 22

16 Sep 08:44

github-actions

b3767

5c3d0f1

b3767

ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)

* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before

Assets 19

15 Sep 16:09

github-actions

b3758

3c7989f

b3758

py : add "LLaMAForCausalLM" conversion support (#9485)

Co-authored-by: Csaba Kecskemeti <[email protected]>

Assets 19

14 Sep 20:11

github-actions

b3755

822b632

b3755

ggml : ggml_type_name return "NONE" for invalid values (#9458)

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

Assets 19

12 Sep 10:58

github-actions

b3736

c9c8575

b3736

enhance run script to be easy to change the parameters (#9448)

Co-authored-by: arthw <[email protected]>

Assets 19

06 Sep 19:39

github-actions

b3676

815b1fb

b3676

batched-bench : add `--output-format jsonl` option (#9293)

`--output-format` is modeled after `llama-bench`'s options

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: Xarbirus/llama.cpp

b4240

b4061

b3969

b3917

b3810

b3767

b3758

b3755

b3736

b3676