EOS is not read from gguf format #446

Alisa-lisa · 2023-12-19T13:00:14Z

I have discovered that running the same model with the same parameters from llm (gguf branch) and llama.cpp results in a different behavior. llm seems to have not been reading EOS token and thus the model creates output until max tokens is reached.
Here is llama.cpp:

And the same model from llm:

According to discord "discussion" it might be indeed a bug.

philpax · 2023-12-19T13:03:32Z

Thanks for reporting this! For my own reference, the issue is that this doesn't get the EOT from the tokenizer - instead, it assumes that it's the hardcoded token </s>. This made sense in the early days of LLaMA, but is no longer true:

llm/crates/models/llama/src/lib.rs

Line 373 in e61e5f9

self.tokenizer().id("</s>".as_bytes()).unwrap_or(2)

philpax self-assigned this Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOS is not read from gguf format #446

EOS is not read from gguf format #446

Alisa-lisa commented Dec 19, 2023

philpax commented Dec 19, 2023

EOS is not read from gguf format #446

EOS is not read from gguf format #446

Comments

Alisa-lisa commented Dec 19, 2023

philpax commented Dec 19, 2023