-
Hello, I'm trying to better understand the /metrics output from llama-server. Specifically how the is Another question I had is which model parameters can we change to control the length of the output prompt? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Try using |
Beta Was this translation helpful? Give feedback.
Thanks for the answer @dspasyuk . It pointed me in the right direction!
I found that the function
update_slots
These lines will start processing the prompts from the slots within the server, which are considered
initial prompt tokens
:llama.cpp/examples/server/server.cpp
Lines 1874 to 1880 in 0fff7fd
Then these lines will checks whether the prompt exceeds the context size (slot.n_ctx). If so, it truncates the input to fit wit…