Skip to content

Looking for help understanding llama-server /metrics #10325

Answered by Allan-Luu
Allan-Luu asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for the answer @dspasyuk . It pointed me in the right direction!

I found that the function update_slots

These lines will start processing the prompts from the slots within the server, which are considered initial prompt tokens:

if (slot.state == SLOT_STATE_STARTED) {
slot.t_start_process_prompt = ggml_time_us();
slot.t_start_generation = 0;
slot.n_past = 0;
slot.n_prompt_tokens = prompt_tokens.size();
slot.state = SLOT_STATE_PROCESSING_PROMPT;

Then these lines will checks whether the prompt exceeds the context size (slot.n_ctx). If so, it truncates the input to fit wit…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Allan-Luu
Comment options

Answer selected by Allan-Luu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants