[0.17.10] - 2024-09-07
- Change supported llama.cpp version to b3676.
- Add
LLAMA_VOCAB_TYPE_RWKV
constant. - Add
LLAMA_FTYPE_MOSTLY_TQ1_0
andLLAMA_FTYPE_MOSTLY_TQ2_0
constants. - Change type of n_threads and n_threads_batch from uint32_t to int32 in native extension codes.
- Add
Implementation bindings for llama_attach_threadpool and llama_detach_threadpool have been skipped.
[0.17.9] - 2024-08-31
- Change supported llama.cpp version to b3639.
- There are no changes in the API.
[0.17.8] - 2024-08-25
- Change supported llama.cpp version to b3614.
- Add
LLAMA_VOCAB_PRE_TYPE_EXAONE
constant. - Add
is_recurrent?
method toModel
.
- Add
[0.17.7] - 2024-08-17
- Change supported llama.cpp version to b3590.
- Add
LLAMA_VOCAB_PRE_TYPE_BLOOM
andLLAMA_VOCAB_PRE_TYPE_GPT3_FINNISH
constants
- Add
[0.17.6] - 2024-08-09
- Change supported llama.cpp version to b3524.
- Change
LLAMA_SESSION_VERSION
value from 7 to 8. - Change
LLAMA_STATE_SEQ_VERSION
value from 1 to 2.
- Change
[0.17.5] - 2024-08-03
- Change supported llama.cpp version to b3482.
- Add
LLAMA_VOCAB_PRE_TYPE_SMOLLM
andLLAMA_VOCAB_PRE_TYPE_CODESHELL
constants. - Change to call llama_grammar_sample and llama_grammar_accept_token functions instead of deprecated functions.
- Add
Implementation binding for llama_lora_adapter_clear has been skipped.
[0.17.4] - 2024-07-27
- Change supported llama.cpp version to b3436.
- Add
LLAMA_VOCAB_PRE_TYPE_TEKKEN
constant. - Change
LLAMA_SESSION_VERSION
value from 6 to 7.
- Add
[0.17.3] - 2024-07-21
- Change supported llama.cpp version to b3405.
- Remove
LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16
constant. - Add model file type constans:
LLAMA_FTYPE_MOSTLY_Q4_0_4_4
,LLAMA_FTYPE_MOSTLY_Q4_0_4_8
, andLLAMA_FTYPE_MOSTLY_Q4_0_8_8
.
- Remove
Implementation bindings for llama_lora_adapter_init, llama_lora_adapter_set, llama_lora_adapter_remove, and llama_lora_adapter_free has been skipped.
[0.17.2] - 2024-07-14
- Change supported llama.cpp version to b3358.
- Add vocabulary pre-tokenization type constants.
- Add attention type constants.
- Add
attention_type
accessor toContextParams
. - Add
lstrip
andspecial
keyword arguments totoken_to_piece
method inModel
. - Add
has_encoder?
,decoder_start_token
, anddetokenize
methods toModel
. - Add
encode
method toContext
.
[0.17.1] - 2024-07-06
- Update usage section on README.
- Change supported llama.cpp version to b3291.
- Add
LLAMA_VOCAB_PRE_TYPE_JAIS
constant.
- Add
[0.17.0] - 2024-06-29
Breaking Changes
I stopped including the llama.cpp source code in the gem, as it became difficult to keep up with changes in the llama.cpp file structure. You need to install the llama.cpp library separately. If you are using homebrew on macOS, the following command will install the library:
$ brew install llama.cpp
$ gem install llama_cpp -- --with-opt-dir=/opt/homebrew
- Change supported llama.cpp version to b3265.
- Add
LLAMA_VOCAB_TYPE_UGM
andLLAMA_VOCAB_PRE_TYPE_VIKING
constants. - Add
token_pad
method toModel
.
[0.16.2] - 2024-06-22
- Bump llama.cpp from b3151 to b3197.
- Add
LLAMA_POOLING_TYPE_LAST
constant. - Add
--with-vulkan-memory-debug
config option. - Add
set_embeddings
method toContext
.
- Add
[0.16.1] - 2024-06-15
- Bump llama.cpp from b3091 to b3151.
- Add
--with-openblas641
and--with-no-llamafile
config options. - Add
LLAMA_VOCAB_PRE_TYPE_PORO
andLLAMA_GRETYPE_CHAR_ANY
constants.
- Add
[0.16.0] - 2024-06-08
Breaking Changes
- Bump llama.cpp from b3056 to b3091.
- Rename
type
method totoken_attr
inModel
. - Add constants for token attribute types.
- Remove
--with-clblast
and--with-mpi
config options. - Add
--with-no-openmp
config option.
- Rename
[0.15.4] - 2024-06-01
- Bump llama.cpp from b2988 to b3056.
- Add LLAMA_VOCAB_PRE_TYPE_SMAUG constant.
- Add
token_is_control?
method toModel
.
[0.15.3] - 2024-05-25
- Bump llama.cpp from b2917 to b2988.
- Add constants for pre-tokenization types.
- Add
n_threads
method toContext
. - Add
n_threads_batch
method toContext
.
- Add
set_n_threads
method toContext
.
[0.15.2] - 2024-05-18
- Bump llama.cpp from b2839 to b2917.
Implementation binding for rpc_servers in llama_model_params has been skipped.
[0.15.1] - 2024-05-11
- Bump llama.cpp from b2781 to b2839.
- Add constants for pre-tokenization types.
- Add constant for model file type.
[0.15.0] - 2024-05-03
- Add new build flag for using CUDA (#18).
- Bump llama.cpp from b2740 to b2781.
- Change
LLAMA_SESSION_VERSION
value from 5 to 6. - Add contants for pre-tokenization types.
- Add
flash_attn
accessor toContextParams
. - Add
heck_tensors
accessor toModelParams
. - Add LLAMA_KV_OVERRIDE_TYPE_STR constant.
- Change
Breaking Change
- Change method names in
ModelKVOverride
.
[0.14.7] - 2024-04-27
- Bump llama.cpp from b2698 to b2740.
- Add
keep_split
accessor toModelQuantizeParams
. - Add
pooling_type
method toContext
. - Add
token_is_eog?
method toModel
.
- Add
Implementation binding for llama_sample_token_with_rng has been skipped.
[0.14.6] - 2024-04-20
- Bump llama.cpp from b2658 to b2698.
[0.14.5] - 2024-04-13
- Bump llama.cpp from b2608 to b2658.
- Add magic number constants.
- Add
token_cls
andtoken_sep
methods toModel
.
Implementation bindings for llama_state_get_size, llama_state_get_data, llama_state_set_data, llama_state_load_file, llama_state_save_file, llama_state_seq_get_size, llama_state_seq_get_data, llama_state_seq_set_data, llama_state_seq_save_file, and llama_state_seq_load_file has been skipped.
[0.14.4] - 2024-04-06
- Bump llama.cpp from b2496 to b2573.
- Add file type constants.
- Bump llama.cpp from b2573 to b2608.
Implementation bindings for llama_split_path, llama_split_prefix binding, llama_grammar_accept, and decode_utf8 has been skipped.
[0.14.3] - 2024-03-23
- Bump llama.cpp from b2435 to b2496.
- Add
n_layer
method toModel
. - Add
apply_control_vector
method toContext
.
- Add
[0.14.2] - 2024-03-16
- Fix to use metal embed library on macOS.
[0.14.1] - 2024-03-16
- Bump llama.cpp from b2361 to b2435.
- Add constants for vocaburary type:
LLAMA_VOCAB_TYPE_NONE
. - Add
n_ubatch
andn_seq_max
accessors toContextParams
. - Add
n_ubatch
,n_seq_max
,set_causal_attn
, andsynchronize
methods toContext
.
- Add constants for vocaburary type:
[0.14.0] - 2024-03-09
Breaking Changes
- Bump bundled llama.cpp from b2303 to b2361.
- Rename embedding accessor to
embeddings
inContextParams
. - Remove
do_pooling
accessor fromContextParams
. - Add
pooling_type
accessor toContextParams
. - Fix the size of array returned by
embedding
method inContext
fromn_embd
ton_tokens * n_embd
. - Add
embeddings_seq
method toContext
.
- Rename embedding accessor to
[0.13.0] - 2024-03-02
Breaking Changes
- Bump bundled llama.cpp from b2143 to b2303.
- Remove deprecated methods:
map_supported?
,mlock_supported?
,apply_lora_from_file
,eval
,eval_embd
,sample_classifier_free_guidance
,sample_temperature
, andmul_mat_q
.
- Rename some constants.
- Rename
kv_cache_seq_shift
method tokv_cache_seq_add
. - Add
defrag_thold
accessor toContextParams
. - Add
vocab_type
andrope_type
methods toModel
. - Add
kv_cache_seq_pos_max
,kv_cache_defrag
, andkv_cache_update
methods toContext
.
- Remove deprecated methods:
[0.12.7] - 2024-02-24
- Bump bundled llama.cpp from b2106 to b2143.
- Add constants for file type:
LLAMA_FTYPE_MOSTLY_IQ1_S
andLLAMA_FTYPE_MOSTLY_IQ4_NL
. - Add constants for pooling type:
LLAMA_POOLING_NONE
,LLAMA_POOLING_MEAN
, andLLAMA_POOLING_CLS
. - Add
numa_init
module function toLLaMACpp
. - Remove unnecessary argument from
backend_init
.
- Add constants for file type:
Implementation of llama_chat_apply_template binding has been postponed for the time being.
[0.12.6] - 2024-02-17
- Bump bundled llama.cpp from b2106 to b2143.
- Add constant:
LLAMA_VOCAB_TYPE_WPM
. - Add
do_pooling
accessors to ContextParams. - Add
embeddings_ith
method to Context.
- Add constant:
[0.12.5] - 2024-02-09
- Bump bundled llama.cpp from b2047 to b2106.
[0.12.4] - 2024-02-03
- Bump bundled llama.cpp from b1971 to b2047.
- Add constant for file type:
LLAMA_FTYPE_MOSTLY_IQ3_XXS
. - Add
supports_mmap?
,supports_mlock?
, andsupports_gpu_offload?
module functions toLLaMACpp
. - Add
--with-vulkan
configuration option. - Deprecate
mmap_supported?
andmlock_supported?
module functions inLLaMACpp
. - Remove
LLAMA_MAX_DEVICES
constant.
- Add constant for file type:
[0.12.3] - 2024-01-27
- Bump bundled llama.cpp from b1892 to b1971.
- Add constant for file type:
LLAMA_FTYPE_MOSTLY_Q3_K_XS
. - Add
sample_entropy
method to Context.
- Add constant for file type:
[0.12.2] - 2024-01-20
- Bump bundled llama.cpp from b1833 to b1892.
- Change
LLAMA_SESSION_VERSION
value from 3 to 4. - Add constants for split mode:
LLAMA_SPLIT_NONE
,LLAMA_SPLIT_LAYER
, andLLAMA_SPLIT_ROW
- Add
split_mode
accessor to ModelParams. - Add
sample_apply_guidance
method to Context.
- Change
[0.12.1] - 2024-01-13
- Bump bundled llama.cpp from b1768 to b1833.
- Add model file type constants.
- Add
kv_cache_seq_div
method toContext
.
[0.12.0] - 2024-01-11
- Add
get_one
singleton method toBatch
.
Breaking Changes
- Add deprecation warning to
eval
,eval_embd
, andsample_temperature
methods onContext
. - Change to avoid using deprecated methods on
generate
method and example scripts.
[0.11.1] - 2024-01-08
- Fix to set the values of
@n_tokens
and@has_evaluated
instance variables indecode
method ofContext
. - Add document for
logits
method inContext
. - Add example script for simple text completion: examples/simple.rb
[0.11.0] - 2024-01-07
- Add
set_n_seq_id
andget_n_seq_id
methods toBatch
.
Breaking Changes
- Change to build shared and static libraries of llama.cpp using its Makefile.
- Change keyword arguments of
Batch
constructor. - Remove upper limit check for index value in
Batch
methods.
[0.10.4] - 2024-01-06
- Bump bundled llama.cpp from b1710 to b1768.
[0.10.3] - 2023-12-29
- Bump bundled llama.cpp from b1686 to b1710.
- Add document comment and type declaration of
n_batch
method inContext
.
[0.10.2] - 2023-12-23
- Bump bundled llama.cpp from b1641 to b1686.
- Add
LLAMA_FILE_MAGIC_GGLA
constant. - Add
n_batch
method toContext
.
- Add
[0.10.1] - 2023-12-16
- Bump bundled llama.cpp from b1620 to b1641.
- Add attribute reader for
params
toModel
. - Add
Batch
class, this class was not published because the author forgot to writerb_define_class
.
[0.10.0] - 2023-12-09
- Bump bundled llama.cpp from b1593 to b1620.
- Add
ModelKVOverride
class. - Add
offload_kqv
,type_k
, andtype_v
to ContextParams. - Add kv overwrite type constants.
- Add
Breaking Changes
- Remove
f16_kv
from ContextParams.
[0.9.5] - 2023-12-02
- Bump bundled llama.cpp from b1555 to b1593.
[0.9.4] - 2023-11-25
- Bump bundled llama.cpp from b1523 to b1555.
[0.9.3] - 2023-11-18
- Bump bundled llama.cpp from b1500 to b1523.
- Add
add_bos_token?
method to Model. - Add
add_eos_token?
method to Model.
- Add
[0.9.2] - 2023-11-11
- Bump bundled llama.cpp from b1472 to b1500.
[0.9.1] - 2023-11-03
- Bump bundled llama.cpp from b1429 to b1472
- Rename
kv_cahe_tokens_rm
method tokv_cahce_clear
in Context. - Add
sample_min_p
method to Context. - Add
rope_scaling_type
,rope_freq_base
,rope_freq_scale
,yarn_ext_factor
,yarn_attn_factor
,yarn_beta_fast
,yarn_beta_slow
, andyarn_orig_ctx
to ContextParams. - Add
pure
to ModelQuantizeParams. - Add contstants for RoPE scaling type.
- Rename
[0.9.0] - 2023-10-28
- Fix missing object file for ggml-backend when building with metal and cublas options.
Breaking Changes
- Bump bundled llama.cpp from b1405 to b1429
- Move following methods from Context to Model:
- text, score, type, token_bos, token_eos, token_nl, token_prefix, token_middle, token_suffix, and token_eos.
- Add
sample_repetition_penalties
method, which integrates sample_frequency_and_presence_penalties and sample_repetition_penalty methods.
- Move following methods from Context to Model:
[0.8.0] - 2023-10-21
Breaking Changes
- Bump bundled llama.cpp from b1380 to b1405
- Add column index argument to
set_seq_id
andget_seq_id
methods in Batch. - Add
special
keyword argument totokenize
method in Model. - Add
n_seq_max
keyword argument toinitialize
method in Batch.
- Add column index argument to
[0.7.1] - 2023-10-14
- Bump bundled llama.cpp from b1334 to b1380.
[0.7.0] - 2023-10-07
- Bump bundled llama.cpp from b1292 to b1334.
- Refactor
generate
module function.
Breaking Changes
- Change to return UTF-8 String on
token_to_piece
anddesc
methods inModel
andtext
method inContext
.
[0.6.0] - 2023-09-30
Breaking Changes
- Bump bundled llama.cpp from b1266 to b1292.
- There are many API changes, so please refer to the commits.
It is becoming difficult to keep up with major changes in llama.cpp, and I may give up on developing this gem in the future to prioritize my own life.
[0.5.3] - 2023-09-23
- Bump bundled llama.cpp from b1 to b1266.
[0.5.2] - 2023-09-16
- Bump bundled llama.cpp from b1198 to b1.
- Add
n_ctx_train
method to Model and Context.
- Add
- Add nvcc option to avoid link error (#8).
- Set encoding on output of
generate
module function to avoid encoding error (#9). - Add
only_copy
option to ModelQuantizeParams.
[0.5.1] - 2023-09-08
- Bump bundled llama.cpp from b1140 to b1198.
[0.5.0] - 2023-09-02
Breaking Changes
- Bump bundled llama.cpp from b1060 to b1140.
- Rename
token_to_str
method on Context totoken_to_piece
method. - Rename
token_to_str
method on Model totoken_to_piece
method. - Rename
type
method on Model todesc
method. - Add
size
andn_params
methods to Model.
- Rename
[0.4.0] - 2023-08-26
Breaking Changes
- Bump bundled llama.cpp from master-097e121 to b1060.
- Support new file format GGUF.
- You should re-convert / re-quantize your model files.
- Remove vocab methods.
- Move token_bos, token_eos, and token_nl methods to Context.
- Add text, score, and type methods to Context.
- Support new file format GGUF.
[0.3.8] - 2023-08-19
- Bump bundled llama.cpp from master-9ca4abe to master-097e121.
- Add
type
method to Model.
- Add
- Revert pull request #2592 in llama.cpp. It seems that PWIN32_MEMORY_RANGE_ENTRY and WIN32_MEMORY_RANGE_ENTRY do not exist in mingw.
[0.3.7] - 2023-08-12
- Bump bundled llama.cpp from master-468ea24 to master-9ca4abe .
[0.3.6] - 2023-08-04
- Bump bundled llama.cpp from master-1a94186 to master-468ea24.
- Add
mul_mat_q
option to ContextParams.
- Add
[0.3.5] - 2023-07-29
- Bump bundled llama.cpp from master-d924522 to master-1a94186.
- Add
GrammarElement
andGrammar
classes. - Add
sample_grammar
method to Context. - Add
grammar_accept_token method
method to Context.
- Add
[0.3.4] - 2023-07-23
- Bump bundled llama.cpp from master-32c5411 to master-d924522.
- Add
rope_freq_base
andrope_freq_scale
options to ContextParams. - Add
max_devices
module function to LLaMACpp. - Add
n_vocab
,n_ctx
, andn_embd
methods to Model. - Add
vocab
,tokenize
, andtoken_to_str
methods to Model.
require 'llama_cpp' params = LLaMACpp::ContextParams.new model = LLaMACpp::Model.new(model_path: '/path/to/model.bin', params: params) p model.tokenize(text: 'hello, world') # => [12199, 29892, 3186] p model.token_to_str(12199) # => "hello"
- Add
Breaking Changes
- Fix to automatically call
backend_free
method when Ruby script exits. - Remove
smooth_factor
argument fromsample_classifier_free_guidance methos
on Context.
[0.3.3] - 2023-07-15
- Bump bundled llama.cpp from master-481f793 to master-32c5411.
- Add MPI config options:
$ gem install llama_cpp -- --with-mpi
- Add
backend_free
module function toLLaMACpp
. This method should be called once at the end of the program when the MPI option is enabled. - Add
sample_classifier_free_guidance
method toContext
.
Breaking Changes
- Rename
init_backend
method tobackend_init
. This method is called internally atrequire 'llama_cpp'
.
[0.3.2] - 2023-07-08
- Bump bundled llama.cpp from master-b8c8dda to master-481f793.
- Add
Timings
class andtimings
method toContext
:require 'llama_cpp' # ... context = LLaMACpp::Context.new(model: model) timings = context.timings puts timings.class # => LLaMACpp::Timings puts timings.t_load_ms # => 79.61
- Expose sampling options as the arguemnts of
generate
module function:require 'llama_cpp' # ... LLaMACpp.generate(context, 'Hello, world.', top_k: 30, top_p: 0.8, temperature: 0.9)
- Add
ModelQuantizaParams
class, this class was not published because the author forgot to write rb_define_class. - Minor update to example scripts, configuration files, and documentations.
[0.3.1] - 2023-07-02
- Bump bundled llama.cpp from master-9d23589 to master-b8c8dda.
- Use unsigned values for random seed.
- Add
eval_embd
method toContext
class.
[0.3.0] - 2023-06-30
- Add no_k_quants and qkk_64 config options:
$ gem install llama_cpp -- --with-no_k_quants
$ gem install llama_cpp -- --with-qkk_64
Breaking Changes
- Remove
Client
class to concentrate on developing bindings. - Bump bundled llama.cpp from master-7487137 to master-9d23589.
- llama_init_from_file and llama_apply_lora_from_file are deprecated.
- Add
Model
class for wrapping llama_model. - Move the
apply_lora_from_file method
,free
,load
, andempty?
methods toModel
class fromContext
class. - Change arguments of initialize method of Context. Its initialize method requires Model object instead of the model's file path.
requre 'llama_cpp' params = LLaMACpp::ContextParams.new model = LLaMACpp::Model.new(model_path: '/path/to/quantized-model.bin', params: params) context = LLaMACpp::Context.new(model: model) LLaMACpp.generate(context, 'Hello, world.')
[0.2.2] - 2023-06-24
- Bump bundled llama.cpp from master-a09f919 to master-7487137.
[0.2.1] - 2023-06-17
- Bump bundled llama.cpp from master-4de0334 to master-a09f919.
- Add
low_vram
parameter to ContextParams. - Add
vocab
method to Context. - Add example script: https://github.com/yoshoku/llama_cpp.rb/tree/main/examples
[0.2.0] - 2023-06-11
- Bump bundled llama.cpp from master-ffb06a3 to master-4de0334.
- Fix installation files for CUDA.
- Add metal config option:
$ gem install llama_cpp -- --with-metal
require 'llama_cpp' params = LLaMACpp::ContextParams.new params.n_gpu_layers = 1 context = LLaMACpp::Context.new(model_path: '/path/to/quantized-model.bin', params: params) LLaMACpp.generate(context, 'Hello, world.')
Breaking Changes
- Add ModelQuantizationParams class.
- Change the argument of the
model_quantize
module function in LLaMACpp.require 'llama_cpp' params = LLaMACpp::ModelQuantizeParams.new LLaMACpp.model_quantize(input_path: 'foo.model', output_path: 'bar.model', params: params)
[0.1.4] - 2023-06-03
- Bump bundled llama.cpp from master-66874d4 to master-ffb06a3.
[0.1.3] - 2023-05-27
- Bump bundled llama.cpp from master-265db98 to master-66874d4.
[0.1.2] - 2023-05-22
Breaking Changes
- Bump bundled llama.cpp from master-6986c78 to master-265db98.
- bump LLAMA_FILE_VERSION to 3.
[0.1.1] - 2023-05-21
- Add load_session_file method to Context.
- Add save_session_file method to Context.
Breaking Changes
- Bump bundled llama.cpp from master-173d0e6 to master-6986c78.
- bump LLAMA_FILE_VERSION to 2.
[0.1.0] - 2023-05-20
Breaking Changes
- Bump bundled llama.cpp from master-11d9023 to master-173d0e6.
- Support new API.
[0.0.7] - 2023-04-29
- Bump bundled llama.cpp from master-12b5900 to master-11d9023.
- Add Client class.
- Add model file type constants.
- Add getter and setter methods of use_mmap to ContextParams.
- Add empty? method to Context.
- Add clblast config option:
$ gem install llama_cpp -- --with-clblast
[0.0.6] - 2023-04-22
- Bump bundled llama.cpp from master-315a95a to master-12b5900.
- Add model file type constants.
- Add
model_quantize
module function to LLaMACpp. - Add cublas config option:
$ gem install llama_cpp -- --with-cublas
[0.0.5] - 2023-04-20
- Bump bundled llama.cpp from master-c85e03d to master-315a95a.
- Add
apply_lora_from_file
method to LLaMACpp::Context. - Add
mlock_supported?
module function to LLaMACpp. - Add
mmap_supported?
module function to LLaMACpp. - Fix to not destroy original prompt in
LLaMACpp.generate
module function. - Add check for context initialization.
- Add blas config options:
macOS:
$ gem install llama_cpp -- --with-openblas
$ gem install llama_cpp -- --with-openblas --with-opt-dir=/opt/homebrew/opt/openblas $ gem install llama_cpp -- --with-accelerate
[0.0.4] - 2023-04-15
- Bump bundled llama.cpp from master-698f7b5 to master-c85e03d.
- Add parameterless constructor to LLaMACpp::Context.
- Add free and load methods to LLaMACpp::Context.
require 'llama_cpp' context = LLaMACpp::Context.new params = LLaMACpp::ContextParams.new context.load(model_path: '/path/to/ggml-model-q4_0.bin', params: params) # ... context.free
[0.0.3] - 2023-04-08
- Bump bundled llama.cpp from master-5b70e7d to master-698f7b5.
- Add logits method to LLaMACpp::Context.
- Add type signatures.
- Add class alias Params for LLaMACpp::ContextParams.
[0.0.2] - 2023-04-02
- Bump bundled llama.cpp from master-2a98bc1 to master-5b70e7d.
- Add n_threads arguments to generate method.
- Initial release