-
-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LLM response streaming #859
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlqqq
Kudos on adding streaming so quickly 🚀 , the UX is much better with these changes. Added some minor suggestions, but looks good otherwise.
Noticed that /ask is still using the non-streaming messages, do you plan to handle streaming for /ask in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks and works great! Little to add beyond @3coins 's comments earlier. Also, could you please file issues to cover the proposed additional enhancements out of scope for this PR?
- ensures users never miss streamed chunks when joining - also removes temporary print/log statements introduced in prev commit
Co-authored-by: Piyush Jain <[email protected]>
@3coins @JasonWeill Thank you both for the thoughtful review. I've addressed your comments in the latest revision.
Yes. There were some issues on the |
@dlqqq |
Opened a new issue to track |
how to interrupt the generation? |
* minimal implementation of chat streaming * improve chat history handling - ensures users never miss streamed chunks when joining - also removes temporary print/log statements introduced in prev commit * add jupyter_ai_test package for developer testing * pre-commit * improve readability of for loop finding stream msg Co-authored-by: Piyush Jain <[email protected]> * remove _version.py * remove unused ConversationBufferWindowMemory * update jupyter_ai_test README * add _version.py files to top-level .gitignore * pre-commit --------- Co-authored-by: Piyush Jain <[email protected]>
Description
This is a large PR that implements LLM response streaming in Jupyter AI. LangChain LLM classes that implement the
_stream()
or_astream()
methods can render the response incrementally in chunks, which allows users to view the response being built token-by-token.Also fixes #858 by adding a version ceiling on
faiss-cpu
.Demo of single-user scenario
Screen.Recording.2024-06-25.at.3.58.25.PM.mov
Demo of multi-user scenario, showing stream completeness & consistency
Screen.Recording.2024-06-25.at.4.02.43.PM.mov
Extended description (for developers & reviewers)
"Generating response..."
pending message is still shown to the user while waiting to receive the first chunk from the LLM.ConnectionMessage
object streamed to the client by the server extension on connection now includes aChatHistory
object, which contains theChatHistory
.GetHistory
REST API endpoint to retrieve the history separately; the entire chat history can now be obtained by listening to the Chat websocket connection.jupyter_ai_test
package that includes aTestProvider
andTestStreamingProvider
.jupyter_ai_test
can be installed in your dev environment simply by runningjlpm dev-uninstall && jlpm dev-install
..jupyter_releaser.toml
, this package will not be released to NPM or PyPI. It is intended for testing in local development workflows only.Follow-up work