Implement LLM response streaming #859

dlqqq · 2024-06-25T23:18:14Z

Description

This is a large PR that implements LLM response streaming in Jupyter AI. LangChain LLM classes that implement the _stream() or _astream() methods can render the response incrementally in chunks, which allows users to view the response being built token-by-token.

Also fixes #858 by adding a version ceiling on faiss-cpu.

Demo of single-user scenario

Screen.Recording.2024-06-25.at.3.58.25.PM.mov

Demo of multi-user scenario, showing stream completeness & consistency

Screen.Recording.2024-06-25.at.4.02.43.PM.mov

Extended description (for developers & reviewers)

The "Generating response..." pending message is still shown to the user while waiting to receive the first chunk from the LLM.
How chat history is retrieved & managed by the client has been fully re-worked; the initial ConnectionMessage object streamed to the client by the server extension on connection now includes a ChatHistory object, which contains the ChatHistory.
- Clients no longer need to call the GetHistory REST API endpoint to retrieve the history separately; the entire chat history can now be obtained by listening to the Chat websocket connection.
- This change was made to ensure that new clients connecting while Jupyter AI is streaming a response do not "miss" any chunks that are being actively streamed while joining. Since the chat history is received as soon as the WebSocket connection is established, clients should not miss any chunks even if they join mid-stream. This is demonstrated in the "Demo of multi-user scenario" section above.
This PR also introduces a jupyter_ai_test package that includes a TestProvider and TestStreamingProvider.
- For reviewers: jupyter_ai_test can be installed in your dev environment simply by running jlpm dev-uninstall && jlpm dev-install.
- Since this package is not listed under .jupyter_releaser.toml, this package will not be released to NPM or PyPI. It is intended for testing in local development workflows only.

Follow-up work

The "Include selection" and "Replace selection" checkboxes need to be removed prior to the next release, as they likely do not work anymore. To avoid a breaking change, they must be replaced by alternative features in the UI when removed:
- "Include selection": This should be implemented by replacing the current "Send" button with a dropdown button that allows users to send a prompt with a text/cell selection.
- "Replace selection": This should be implemented by:
  1. Supporting replacing text selection in the code action toolbar
  2. Adding a hamburger menu on each message that contains an option for replacing the current text/cell selection with the entire Markdown contents of a message.
- I argue this should be done in a future PR prior to the next release however, as these features lie out of the scope of this PR.

3coins

@dlqqq
Kudos on adding streaming so quickly 🚀 , the UX is much better with these changes. Added some minor suggestions, but looks good otherwise.
Noticed that /ask is still using the non-streaming messages, do you plan to handle streaming for /ask in a separate PR?

packages/jupyter-ai-test/README.md

packages/jupyter-ai-test/package.json

packages/jupyter-ai/jupyter_ai/handlers.py

packages/jupyter-ai/src/chat_handler.ts

packages/jupyter-ai-test/jupyter_ai_test/_version.py

packages/jupyter-ai/jupyter_ai/history.py

JasonWeill

This looks and works great! Little to add beyond @3coins 's comments earlier. Also, could you please file issues to cover the proposed additional enhancements out of scope for this PR?

packages/jupyter-ai/jupyter_ai/chat_handlers/default.py

- ensures users never miss streamed chunks when joining - also removes temporary print/log statements introduced in prev commit

Co-authored-by: Piyush Jain <[email protected]>

dlqqq · 2024-06-27T17:04:09Z

@3coins @JasonWeill Thank you both for the thoughtful review. I've addressed your comments in the latest revision.

Noticed that /ask is still using the non-streaming messages, do you plan to handle streaming for /ask in a separate PR?

Yes. There were some issues on the langchain-ai/langchain repo about LCEL streaming when using retrievers, so I decided it would be best to implement streaming for /ask in a separate, future PR.

3coins · 2024-06-27T17:12:58Z

@dlqqq
Here is a reference to building RAG with LCEL, but ok to do this in a separate PR.
https://python.langchain.com/v0.1/docs/use_cases/question_answering/chat_history/#tying-it-together

dlqqq · 2024-06-27T17:37:14Z

Opened a new issue to track /ask streaming: #863

wenmin-wu · 2024-08-09T08:51:07Z

how to interrupt the generation?

* minimal implementation of chat streaming * improve chat history handling - ensures users never miss streamed chunks when joining - also removes temporary print/log statements introduced in prev commit * add jupyter_ai_test package for developer testing * pre-commit * improve readability of for loop finding stream msg Co-authored-by: Piyush Jain <[email protected]> * remove _version.py * remove unused ConversationBufferWindowMemory * update jupyter_ai_test README * add _version.py files to top-level .gitignore * pre-commit --------- Co-authored-by: Piyush Jain <[email protected]>

dlqqq added the enhancement New feature or request label Jun 25, 2024

dlqqq force-pushed the streaming branch from 304b98b to 5efb6f2 Compare June 25, 2024 23:20

3coins reviewed Jun 26, 2024

View reviewed changes

packages/jupyter-ai-test/README.md Outdated Show resolved Hide resolved

packages/jupyter-ai-test/package.json Show resolved Hide resolved

packages/jupyter-ai/jupyter_ai/handlers.py Outdated Show resolved Hide resolved

packages/jupyter-ai/src/chat_handler.ts Show resolved Hide resolved

3coins reviewed Jun 26, 2024

View reviewed changes

packages/jupyter-ai-test/jupyter_ai_test/_version.py Outdated Show resolved Hide resolved

krassowski linked an issue Jun 26, 2024 that may be closed by this pull request

Stream textual responses token-by-token #228

Closed

JasonWeill reviewed Jun 26, 2024

View reviewed changes

packages/jupyter-ai/jupyter_ai/history.py Show resolved Hide resolved

JasonWeill reviewed Jun 26, 2024

View reviewed changes

3coins reviewed Jun 27, 2024

View reviewed changes

packages/jupyter-ai/jupyter_ai/chat_handlers/default.py Outdated Show resolved Hide resolved

dlqqq and others added 8 commits June 27, 2024 09:56

minimal implementation of chat streaming

6d2ad1b

improve chat history handling

02b7658

- ensures users never miss streamed chunks when joining - also removes temporary print/log statements introduced in prev commit

add jupyter_ai_test package for developer testing

c8d592b

pre-commit

48eeb80

improve readability of for loop finding stream msg

7ec6760

Co-authored-by: Piyush Jain <[email protected]>

remove _version.py

08bf8db

remove unused ConversationBufferWindowMemory

1c7333e

update jupyter_ai_test README

20b3a3f

dlqqq force-pushed the streaming branch from b477bb3 to 20b3a3f Compare June 27, 2024 16:56

add _version.py files to top-level .gitignore

85df1a9

pre-commit

7e09f4b

dlqqq force-pushed the streaming branch from 3fbb9d5 to 7e09f4b Compare June 27, 2024 17:04

3coins approved these changes Jun 27, 2024

View reviewed changes

dlqqq mentioned this pull request Jun 27, 2024

Implement LLM response streaming in /ask #863

Open

dlqqq merged commit 5183bc9 into jupyterlab:main Jun 27, 2024
8 checks passed

dlqqq deleted the streaming branch June 27, 2024 23:17

dlqqq added this to the v2.19.0 milestone Jun 27, 2024

dlqqq mentioned this pull request Jun 30, 2024

Remove "Replace selection" checkbox #865

Closed

jtpio mentioned this pull request Jul 1, 2024

Use @jupyter/chat as chat UI #862

Open

11 tasks

This was referenced Jul 9, 2024

Some providers (e.g. HuggingFace) not working in chat nor in streaming completer #883

Closed

After an error the "Generating response..." placeholder stays stuck at the bottom of the chat #884

Closed

v2.19.0 release? #891

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement LLM response streaming #859

Implement LLM response streaming #859

dlqqq commented Jun 25, 2024 •

edited

Loading

3coins left a comment •

edited

Loading

JasonWeill left a comment

dlqqq commented Jun 27, 2024

3coins commented Jun 27, 2024

dlqqq commented Jun 27, 2024

wenmin-wu commented Aug 9, 2024

Implement LLM response streaming #859

Implement LLM response streaming #859

Conversation

dlqqq commented Jun 25, 2024 • edited Loading

Description

Demo of single-user scenario

Demo of multi-user scenario, showing stream completeness & consistency

Extended description (for developers & reviewers)

Follow-up work

3coins left a comment • edited Loading

Choose a reason for hiding this comment

JasonWeill left a comment

Choose a reason for hiding this comment

dlqqq commented Jun 27, 2024

3coins commented Jun 27, 2024

dlqqq commented Jun 27, 2024

wenmin-wu commented Aug 9, 2024

dlqqq commented Jun 25, 2024 •

edited

Loading

3coins left a comment •

edited

Loading