Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify inline completion backend #553

Merged
merged 2 commits into from
Jan 5, 2024

Conversation

dlqqq
Copy link
Member

@dlqqq dlqqq commented Dec 30, 2023

cc @krassowski

Summary of changes

  • Imports Pydantic from langchain.pydantic_v1 instead of pydantic
  • Simplifies the inline completion backend
    • Unifies the existing BaseInlineCompletionHandler and InlineCompletionHandler definitions
    • Removes code that broadcasts messages to all inline completion clients
  • Declares more legible names for methods on BaseInlineCompletionHandler:
    • process_message() => handle_request()
    • stream() => handle_stream_request()

Context

The existing implementation has two types of handlers:

  • BaseICH/DefaultICH: A singleton that is instantiated and bound to self.settings when the server extension is initialized.
  • ICH (in handlers.py): The WebSocket handler that accepts requests and calls the appropriate method on BaseICH. Instantiated by Tornado automatically per-connection.

(ICH := InlineCompletionHandler)

This pattern was inherited from how Jupyter AI handles chat. However, that pattern of "Tornado handlers calling singletons" only exists in chat because chat is global, meaning that chat-related state had to be shared across all users. This explains why the singleton pattern exists there. However, inline completions don't need to be shared across all users. This means we can actually define ICH and BaseICH in the same class.

We can simplify the implementation further by removing the need to track all inline completion sessions in a dictionary. I had to remove the _modelChanged signal in the frontend to do this. The existing method of tracking whether a model changed was faulty, as it only emits the signal when we check that the model has changed, not when the model actually is changed by the user in the settings UI. If we do in fact want to keep this feature, we can re-implement it correctly.

Lastly, DefaultICH.process_message() was rather strange in that it had different behavior depending on whether the request was a stream request. I've chosen to entirely split the interface, and require subclasses to define handle_request() and handle_stream_request() separately instead.

@dlqqq dlqqq added the enhancement New feature or request label Dec 30, 2023
@dlqqq dlqqq requested a review from krassowski December 30, 2023 01:36
@dlqqq dlqqq self-assigned this Dec 30, 2023
@dlqqq
Copy link
Member Author

dlqqq commented Dec 30, 2023

There were a few other minor issues that I had noticed, but I need to take off for my holiday long weekend. I will be back on Tuesday (1/2/2024) to file an issue. Let me know if you'd like the issue filed sooner.

@krassowski
Copy link
Member

However, inline completions don't need to be shared across all users

On conceptual level this is true. I wonder if this change has implications for local models which do not support concurrency. I wonder if it could mean a breakage when two users request different local models to return simultaneously. In particular this could be the case for models from the same provider. I have not checked, but see this as a real possibility if the model configuration would no longer be shared between users.

@krassowski
Copy link
Member

Lastly, DefaultICH.process_message() was rather strange in that it had different behavior depending on whether the request was a stream request. I've chosen to entirely split the interface, and require subclasses to define handle_request() and handle_stream_request() separately instead.

Looks good on the surface (will take another look later), thanks!

If we do in fact want to keep this feature, we can re-implement it correctly

I think we want to keep this feature but it can be implemented in a separate PR, along with separation of models per chat and per completer.

@dlqqq
Copy link
Member Author

dlqqq commented Dec 30, 2023

@krassowski

On conceptual level this is true. I wonder if this change has implications for local models which do not support concurrency. I wonder if it could mean a breakage when two users request different local models to return simultaneously.

I believe neither the existing implementation nor this PR support non-concurrent models. In both cases we are always calling loop.create_task() without checking for whether the model supports concurrency or not.

When using a singleton handler (existing implementation), we can add support simply by replacing loop.create_task(<coro_obj>) with await <coro_obj> when using a non-concurrent model. However, since this PR removes the singleton pattern to avoid confusion, a follow-up PR would need to add some kind of lock that handlers should await.

I think we want to keep this feature but it can be implemented in a separate PR, along with separation of models per chat and per completer.

I think the next step is to allow the completion LM (language model) to be freely chosen by the user and stored client-side, with the model sent per-request. We only enforce global state in the chat, since the chat is seen by all users. However, since inline completion is only seen by the user, I see no reason for this state to live server-side. That would make client-side rendering of the selected completion LM way easier, since the client is now the one storing that state.

Authentication is the unknown part here. Perhaps for now, authentication & other fields (e.g. temperature and top_k) are read from the ConfigManager, but the LM used for completion is provided client-side (specified as a model ID)? Perhaps that's too confusing. Regardless, I'd love for us to think more deeply about this.

@d5423197
Copy link

d5423197 commented Jan 3, 2024

Hello authors,

I do not know if it is appropriate to ask qs about the in-line code feature here.

I really like this feature and want to know when the feature will be released.

Please let me know if you guys have a separate PR or sth to record all the timeline for this feature.

Thanks,

ZD

@krassowski
Copy link
Member

I think the next step is to allow the completion LM (language model) to be freely chosen by the user and stored client-side, with the model sent per-request. We only enforce global state in the chat, since the chat is seen by all users. However, since inline completion is only seen by the user, I see no reason for this state to live server-side.

Here are some reasons to keep the model state server-side:

  • initialization of local models can take from dozens of milliseconds to multiple seconds as these are loaded into memory; if we were to initialize the model anew each time it would have negative performance impact
  • users will want to have suggestions relevant to their codebase; this means a future in which we are hooking it up with the RAG in /learn or indexing the files in another way; this is another model-specific state which would take time to initialize

Of course the initialized model and its state can be cached server-side even if the choice of model stored client side, though invalidation of the cache in a way such that it is not emptied back and forth when two users use different models and yet not leaking is not trivial.

authentication & other fields (e.g. temperature and top_k)

Some fields can be modified on per-request basis (temperature, top_k, etc) but other have to be defined for the model to be initialized (model name, authentication, in the future file indexing); The fields which can be modified per-request make sense to be stored client-side; these which are required for initialization should be stored server-side, and authentication in particular of course should be stored server-side.

@dlqqq
Copy link
Member Author

dlqqq commented Jan 3, 2024

@krassowski Thanks for responding to my feedback. Just to confirm: have you reviewed this yet? If so, I can merge this today and rebase autocomplete on main.

My response is that all of these good suggestions you're raising are still possible just by storing the model ID on the client. IMO, this just makes everything easier: the client doesn't need to call await AiService.getCompletionsModel(), parse the value, handle exceptions, handle loading state, etc. Why would a client need to make a network request to the server to know what model it is using when it is the one choosing the model?

@krassowski
Copy link
Member

krassowski commented Jan 4, 2024

Just to confirm: have you reviewed this yet? If so, I can merge this today and rebase autocomplete on main.

I had a quick glance and all looked good but have reduced availability until 12th and will not be able to test locally until then so I would say feel free to merge and we can iterate afterwards.

@dlqqq
Copy link
Member Author

dlqqq commented Jan 5, 2024

@krassowski NP, thanks for taking the time to let me know! I don't think there's any urgency for now, since this branch probably shouldn't be released until JupyterLab 4.1.0 is released.

@dlqqq dlqqq merged commit 1654759 into jupyterlab:autocomplete Jan 5, 2024
9 checks passed
dlqqq added a commit that referenced this pull request Jan 11, 2024
* do not import from pydantic directly

* refactor inline completion backend
@krassowski
Copy link
Member

shouldn't be released until JupyterLab 4.1.0 is released

On the other hand, publishing a pre-release (alpha or beta) would allow users to test it with the 4.1.0 beta/rc. Let me know if there is anything I can help with.

dlqqq added a commit that referenced this pull request Jan 19, 2024
* do not import from pydantic directly

* refactor inline completion backend
dlqqq added a commit that referenced this pull request Jan 19, 2024
* Inline code completions (#465)

* Draft inline completions implementation (server side)

* Implement inline completion provider (front)

* Add default debounce delay and error handling (front)

* Add `gpt-3.5-turbo-instruct` because text- models are deprecated.

OpenAI specifically recommends using `gpt-3.5-turbo-instruct` in
favour of text-davinci, text-ada, etc. See:
https://platform.openai.com/docs/deprecations/

* Improve/fix prompt template and add simple post-processing

* Handle missing `registerInlineProvider`, handle no model in name

* Remove IPython mention to avoid confusing languages

* Disable suggestions in markdown, move language logic

* Remove unused background and clip path from jupyternaut

* Implement toggling the AI completer via statusbar item

also adds the icon for provider re-using jupyternaut icon

* Implement streaming support

* Translate ipython to python for models, remove log

* Move `BaseLLMHandler` to `/completions` rename to `LLMHandlerMixin`

* Move frontend completions code to `/completions`

* Make `IStatusBar` required for now, lint

* Simplify inline completion backend (#553)

* do not import from pydantic directly

* refactor inline completion backend

* Autocomplete frontend fixes (#583)

* remove duplicate definition of inline completion provider

* rename completion variables, plugins, token to be more accurate

* abbreviate JupyterAIInlineProvider => JaiInlineProvider

* bump @jupyterlab/completer and typescript

* WIP: fix Jupyter AI completion settings

* Fix issues with settings population

* read from settings directly instead of using a cache

* disable Jupyter AI completion by default

* improve completion plugin menu items

* revert unnecessary edits to package manifest

* Update packages/jupyter-ai/src/components/statusbar-item.tsx

Co-authored-by: Michał Krassowski <[email protected]>

* tweak wording

---------

Co-authored-by: krassowski <[email protected]>

---------

Co-authored-by: David L. Qiu <[email protected]>
dbelgrod pushed a commit to dbelgrod/jupyter-ai that referenced this pull request Jun 10, 2024
* Inline code completions (jupyterlab#465)

* Draft inline completions implementation (server side)

* Implement inline completion provider (front)

* Add default debounce delay and error handling (front)

* Add `gpt-3.5-turbo-instruct` because text- models are deprecated.

OpenAI specifically recommends using `gpt-3.5-turbo-instruct` in
favour of text-davinci, text-ada, etc. See:
https://platform.openai.com/docs/deprecations/

* Improve/fix prompt template and add simple post-processing

* Handle missing `registerInlineProvider`, handle no model in name

* Remove IPython mention to avoid confusing languages

* Disable suggestions in markdown, move language logic

* Remove unused background and clip path from jupyternaut

* Implement toggling the AI completer via statusbar item

also adds the icon for provider re-using jupyternaut icon

* Implement streaming support

* Translate ipython to python for models, remove log

* Move `BaseLLMHandler` to `/completions` rename to `LLMHandlerMixin`

* Move frontend completions code to `/completions`

* Make `IStatusBar` required for now, lint

* Simplify inline completion backend (jupyterlab#553)

* do not import from pydantic directly

* refactor inline completion backend

* Autocomplete frontend fixes (jupyterlab#583)

* remove duplicate definition of inline completion provider

* rename completion variables, plugins, token to be more accurate

* abbreviate JupyterAIInlineProvider => JaiInlineProvider

* bump @jupyterlab/completer and typescript

* WIP: fix Jupyter AI completion settings

* Fix issues with settings population

* read from settings directly instead of using a cache

* disable Jupyter AI completion by default

* improve completion plugin menu items

* revert unnecessary edits to package manifest

* Update packages/jupyter-ai/src/components/statusbar-item.tsx

Co-authored-by: Michał Krassowski <[email protected]>

* tweak wording

---------

Co-authored-by: krassowski <[email protected]>

---------

Co-authored-by: David L. Qiu <[email protected]>
Marchlak pushed a commit to Marchlak/jupyter-ai that referenced this pull request Oct 28, 2024
* Inline code completions (jupyterlab#465)

* Draft inline completions implementation (server side)

* Implement inline completion provider (front)

* Add default debounce delay and error handling (front)

* Add `gpt-3.5-turbo-instruct` because text- models are deprecated.

OpenAI specifically recommends using `gpt-3.5-turbo-instruct` in
favour of text-davinci, text-ada, etc. See:
https://platform.openai.com/docs/deprecations/

* Improve/fix prompt template and add simple post-processing

* Handle missing `registerInlineProvider`, handle no model in name

* Remove IPython mention to avoid confusing languages

* Disable suggestions in markdown, move language logic

* Remove unused background and clip path from jupyternaut

* Implement toggling the AI completer via statusbar item

also adds the icon for provider re-using jupyternaut icon

* Implement streaming support

* Translate ipython to python for models, remove log

* Move `BaseLLMHandler` to `/completions` rename to `LLMHandlerMixin`

* Move frontend completions code to `/completions`

* Make `IStatusBar` required for now, lint

* Simplify inline completion backend (jupyterlab#553)

* do not import from pydantic directly

* refactor inline completion backend

* Autocomplete frontend fixes (jupyterlab#583)

* remove duplicate definition of inline completion provider

* rename completion variables, plugins, token to be more accurate

* abbreviate JupyterAIInlineProvider => JaiInlineProvider

* bump @jupyterlab/completer and typescript

* WIP: fix Jupyter AI completion settings

* Fix issues with settings population

* read from settings directly instead of using a cache

* disable Jupyter AI completion by default

* improve completion plugin menu items

* revert unnecessary edits to package manifest

* Update packages/jupyter-ai/src/components/statusbar-item.tsx

Co-authored-by: Michał Krassowski <[email protected]>

* tweak wording

---------

Co-authored-by: krassowski <[email protected]>

---------

Co-authored-by: David L. Qiu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants