Releases: Xarbirus/llama.cpp
Releases · Xarbirus/llama.cpp
b4240
server: Add "tokens per second" information in the backend (#10548) * add cmake rvv support * add timings * remove space * update readme * fix * fix code * remove empty line * add test --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b4061
metal : reorder write loop in mul mat kernel + style (#10231) * metal : reorder write loop * metal : int -> short, style ggml-ci
b3969
sync : ggml
b3917
server : handle "logprobs" field with false value (#9871) Co-authored-by: Gimling <[email protected]>
b3810
readme : add programmable prompt engine language CLI (#9599)
b3767
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422) * squashed readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049 have ggml_vec_dot_q4_0 do two blocks per loop for avx try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue * shuffle * remove f16c iq4_nl as i cant make it faster than before
b3758
py : add "LLaMAForCausalLM" conversion support (#9485) Co-authored-by: Csaba Kecskemeti <[email protected]>
b3755
ggml : ggml_type_name return "NONE" for invalid values (#9458) When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
b3736
enhance run script to be easy to change the parameters (#9448) Co-authored-by: arthw <[email protected]>
b3676
batched-bench : add `--output-format jsonl` option (#9293) `--output-format` is modeled after `llama-bench`'s options