Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4060
metal : fix build and some more comments (#10229)
b4059
metal : fix F32 accumulation in FA vec kernel (#10232)
b4058
llama : fix Qwen model type strings
b4057
metal : hide debug messages from normal log
b4056
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…
b4055
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>
b4053
metal : opt-in compile flag for BF16 (#10218) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci
b4052
metal : improve clarity (minor) (#10171)
b4050
swift : exclude ggml-metal-embed.metal (#10211) * llama.swift : exclude ggml-metal-embed.metal * swift : exclude build/
b4048
server : revamp chat UI with vuejs and daisyui (#10175) * server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)