Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Adding correct example for model parameters with examples #741

Closed
tikikun opened this issue Nov 28, 2023 · 1 comment · Fixed by #742
Closed

chore: Adding correct example for model parameters with examples #741

tikikun opened this issue Nov 28, 2023 · 1 comment · Fixed by #742
Assignees
Labels
type: docs Improvements or additions to documentation

Comments

@tikikun
Copy link
Contributor

tikikun commented Nov 28, 2023

Problem
We do not have clear example at the jan docs page now

Success Criteria
Simple example of one model loading case

@tikikun tikikun added the type: docs Improvements or additions to documentation label Nov 28, 2023
@tikikun tikikun self-assigned this Nov 28, 2023
@tikikun
Copy link
Contributor Author

tikikun commented Nov 28, 2023

Some clarification about ctx_len and max_tokens

Q: Why there is no ctx_len in OpenAI API?

OpenAI has pre-defined ctx_len for each of their model (for example gpt-3.5-turbo has ctx_len of 4096) and all the models are already loaded which mean you just need to use their chat/completions endpoint.
-> No settings for ctx_len because each model already has a fixed ctx_len
-> Nitro can set your own ctx_len because you load you own model

Q: What is the difference between ctx_len and max_token

max_token: Maximum number of tokens that you allow the model to generate during inferencing

  • Example: You max_token = 10, then if the chat is outputting more than 10 words -> truncate (and lead to truncate issue last time)

ctx_len: The upper limit of token that can be processed during inference on the backend, the relationship between ctx_len and max_token:

$$ \text{max token} + \text{chat token} < \text{ctx len} $$

Q: What will happen if I input a chat that is longer than the ctx len ( max_token + chat_token > ctx_len )

In this scenario a context shift will happen, the inference will cut the extra context that is not fitted into ctx_len and keep doing inferencing normally, but it might lose some memory outside of ctx_len

Q: What value should i add as default params for max_token

In practice you should both set ctx_len and max_token to be the same value, and it should follow the maximum token of the model. An example can be checked at: https://huggingface.co/TheBloke/neural-chat-7B-v3-1-AWQ.

image

So normally just set both values to the values that is specified on where you download the model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Improvements or additions to documentation
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant