Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect the maximum number of tokens in interactive. #298

Merged
merged 2 commits into from
Mar 19, 2023

Conversation

tjohnman
Copy link
Contributor

Even in interactive mode the maximum number of specified tokens should be respected. Instead of ending the main loop, by falling back to user input mode.

@ggerganov ggerganov merged commit 368d0c8 into ggerganov:master Mar 19, 2023
@tjohnman tjohnman deleted the interactive-mode-fix branch March 19, 2023 18:35
@rabidcopy
Copy link
Contributor

rabidcopy commented Mar 19, 2023

Upon further testing of this, it seems to remember things immediately before running out of tokens and resetting. Though sometimes it goes a bit off the rails if it was in the middle of telling a long winded story. Makes somewhat coherent conversations possible even with very low context/n_predict sizes. Kinda crazy it's that simple.. Reminds me of Stable Diffusion where the breakthrough on tokens limiting prompt size and complexity was to just add more tokens when it ran out.

@tjohnman
Copy link
Contributor Author

Upon further testing of this, it seems to remember things immediately before running out of tokens and resetting. Though sometimes it goes a bit off the rails if it was in the middle of telling a long winded story. Makes somewhat coherent conversations possible even with very low context/n_predict sizes. Kinda crazy it's that simple.. Reminds me of Stable Diffusion where the breakthrough on tokens limiting prompt size and complexity was to just add more tokens when it ran out.

The "reset" that happens when it runs out of tokens is not much of a reset, really. It's just keeping track of how many tokens it can generate before letting the user intervene again.

@rabidcopy
Copy link
Contributor

Ah, I get it now. Running out of n_predict is fine and recoverable, but if you run out of ctx_size it still comes to a hard stop. Since increasing ctx_size past 2048 is not recommended for pratical use, I take it there will still need some sort of rolling context cache that pushes out past history when it runs out of room while possibly keeping the initial prompt cached? #71 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants