Releases · ggerganov/llama.cpp

09 Nov 08:35

e892134

b4055

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>

Assets 22

08 Nov 21:10

github-actions

b4053

ec450d3

b4053

metal : opt-in compile flag for BF16 (#10218)

* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci

Assets 22

08 Nov 17:46

github-actions

b4052

695ad75

b4052

metal : improve clarity (minor) (#10171)

Assets 22

08 Nov 11:29

github-actions

b4050

d05b312

b4050

swift : exclude ggml-metal-embed.metal (#10211)

* llama.swift : exclude ggml-metal-embed.metal

* swift : exclude build/

Assets 22

07 Nov 22:46

github-actions

b4048

a71d81c

b4048

server : revamp chat UI with vuejs and daisyui (#10175)

* server : simple chat UI with vuejs and daisyui

* move old files to legacy folder

* embed deps into binary

* basic markdown support

* add conversation history, save to localStorage

* fix bg-base classes

* save theme preferences

* fix tests

* regenerate, edit, copy buttons

* small fixes

* docs: how to use legacy ui

* better error handling

* make CORS preflight more explicit

* add GET method for CORS

* fix tests

* clean up a bit

* better auto scroll

* small fixes

* use collapse-arrow

* fix closeAndSaveConfigDialog

* small fix

* remove console.log

* fix style for <pre> element

* lighter bubble color (less distract when reading)

Assets 22

07 Nov 18:53

github-actions

b4044

97404c4

b4044

ggml : add ggml-cpu.h to the public headers (#10204)

Assets 22

07 Nov 16:34

github-actions

b4042

5107e8c

b4042

DRY: Fixes clone functionality (#10192)

Assets 22

07 Nov 08:56

github-actions

b4041

2319126

b4041

fix q4_0_8_8 format for corrupted tokens issue (#10198)

Co-authored-by: EC2 Default User <[email protected]>

Assets 22

07 Nov 08:31

github-actions

b4040

3bcd40b

b4040

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acc…

Assets 22

06 Nov 12:41

github-actions

b4038

b11f9ba

b4038

server : remove hack for extra parallel slot (#10187)

ggml-ci

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4055

b4053

b4052

b4050

b4048

b4044

b4042

b4041

b4040

b4038