Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4055
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>
b4053
metal : opt-in compile flag for BF16 (#10218) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci
b4052
metal : improve clarity (minor) (#10171)
b4050
swift : exclude ggml-metal-embed.metal (#10211) * llama.swift : exclude ggml-metal-embed.metal * swift : exclude build/
b4048
server : revamp chat UI with vuejs and daisyui (#10175) * server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)
b4044
ggml : add ggml-cpu.h to the public headers (#10204)
b4042
DRY: Fixes clone functionality (#10192)
b4041
fix q4_0_8_8 format for corrupted tokens issue (#10198) Co-authored-by: EC2 Default User <[email protected]>
b4040
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acc…
b4038
server : remove hack for extra parallel slot (#10187) ggml-ci