Simplify inline completion backend #553

dlqqq · 2023-12-30T01:36:24Z

Summary of changes

Imports Pydantic from langchain.pydantic_v1 instead of pydantic
Simplifies the inline completion backend
- Unifies the existing BaseInlineCompletionHandler and InlineCompletionHandler definitions
- Removes code that broadcasts messages to all inline completion clients
Declares more legible names for methods on BaseInlineCompletionHandler:
- process_message() => handle_request()
- stream() => handle_stream_request()

Context

The existing implementation has two types of handlers:

BaseICH/DefaultICH: A singleton that is instantiated and bound to self.settings when the server extension is initialized.
ICH (in handlers.py): The WebSocket handler that accepts requests and calls the appropriate method on BaseICH. Instantiated by Tornado automatically per-connection.

(ICH := InlineCompletionHandler)

This pattern was inherited from how Jupyter AI handles chat. However, that pattern of "Tornado handlers calling singletons" only exists in chat because chat is global, meaning that chat-related state had to be shared across all users. This explains why the singleton pattern exists there. However, inline completions don't need to be shared across all users. This means we can actually define ICH and BaseICH in the same class.

We can simplify the implementation further by removing the need to track all inline completion sessions in a dictionary. I had to remove the _modelChanged signal in the frontend to do this. The existing method of tracking whether a model changed was faulty, as it only emits the signal when we check that the model has changed, not when the model actually is changed by the user in the settings UI. If we do in fact want to keep this feature, we can re-implement it correctly.

Lastly, DefaultICH.process_message() was rather strange in that it had different behavior depending on whether the request was a stream request. I've chosen to entirely split the interface, and require subclasses to define handle_request() and handle_stream_request() separately instead.

dlqqq · 2023-12-30T01:38:01Z

There were a few other minor issues that I had noticed, but I need to take off for my holiday long weekend. I will be back on Tuesday (1/2/2024) to file an issue. Let me know if you'd like the issue filed sooner.

krassowski · 2023-12-30T10:37:04Z

However, inline completions don't need to be shared across all users

On conceptual level this is true. I wonder if this change has implications for local models which do not support concurrency. I wonder if it could mean a breakage when two users request different local models to return simultaneously. In particular this could be the case for models from the same provider. I have not checked, but see this as a real possibility if the model configuration would no longer be shared between users.

krassowski · 2023-12-30T10:38:54Z

Lastly, DefaultICH.process_message() was rather strange in that it had different behavior depending on whether the request was a stream request. I've chosen to entirely split the interface, and require subclasses to define handle_request() and handle_stream_request() separately instead.

Looks good on the surface (will take another look later), thanks!

If we do in fact want to keep this feature, we can re-implement it correctly

I think we want to keep this feature but it can be implemented in a separate PR, along with separation of models per chat and per completer.

dlqqq · 2023-12-30T22:48:36Z

@krassowski

On conceptual level this is true. I wonder if this change has implications for local models which do not support concurrency. I wonder if it could mean a breakage when two users request different local models to return simultaneously.

I believe neither the existing implementation nor this PR support non-concurrent models. In both cases we are always calling loop.create_task() without checking for whether the model supports concurrency or not.

When using a singleton handler (existing implementation), we can add support simply by replacing loop.create_task(<coro_obj>) with await <coro_obj> when using a non-concurrent model. However, since this PR removes the singleton pattern to avoid confusion, a follow-up PR would need to add some kind of lock that handlers should await.

I think we want to keep this feature but it can be implemented in a separate PR, along with separation of models per chat and per completer.

I think the next step is to allow the completion LM (language model) to be freely chosen by the user and stored client-side, with the model sent per-request. We only enforce global state in the chat, since the chat is seen by all users. However, since inline completion is only seen by the user, I see no reason for this state to live server-side. That would make client-side rendering of the selected completion LM way easier, since the client is now the one storing that state.

Authentication is the unknown part here. Perhaps for now, authentication & other fields (e.g. temperature and top_k) are read from the ConfigManager, but the LM used for completion is provided client-side (specified as a model ID)? Perhaps that's too confusing. Regardless, I'd love for us to think more deeply about this.

d5423197 · 2024-01-03T03:07:03Z

Hello authors,

I do not know if it is appropriate to ask qs about the in-line code feature here.

I really like this feature and want to know when the feature will be released.

Please let me know if you guys have a separate PR or sth to record all the timeline for this feature.

Thanks,

ZD

krassowski · 2024-01-03T07:59:12Z

I think the next step is to allow the completion LM (language model) to be freely chosen by the user and stored client-side, with the model sent per-request. We only enforce global state in the chat, since the chat is seen by all users. However, since inline completion is only seen by the user, I see no reason for this state to live server-side.

Here are some reasons to keep the model state server-side:

initialization of local models can take from dozens of milliseconds to multiple seconds as these are loaded into memory; if we were to initialize the model anew each time it would have negative performance impact
users will want to have suggestions relevant to their codebase; this means a future in which we are hooking it up with the RAG in /learn or indexing the files in another way; this is another model-specific state which would take time to initialize

Of course the initialized model and its state can be cached server-side even if the choice of model stored client side, though invalidation of the cache in a way such that it is not emptied back and forth when two users use different models and yet not leaking is not trivial.

authentication & other fields (e.g. temperature and top_k)

Some fields can be modified on per-request basis (temperature, top_k, etc) but other have to be defined for the model to be initialized (model name, authentication, in the future file indexing); The fields which can be modified per-request make sense to be stored client-side; these which are required for initialization should be stored server-side, and authentication in particular of course should be stored server-side.

dlqqq · 2024-01-03T19:52:17Z

@krassowski Thanks for responding to my feedback. Just to confirm: have you reviewed this yet? If so, I can merge this today and rebase autocomplete on main.

My response is that all of these good suggestions you're raising are still possible just by storing the model ID on the client. IMO, this just makes everything easier: the client doesn't need to call await AiService.getCompletionsModel(), parse the value, handle exceptions, handle loading state, etc. Why would a client need to make a network request to the server to know what model it is using when it is the one choosing the model?

krassowski · 2024-01-04T21:22:31Z

Just to confirm: have you reviewed this yet? If so, I can merge this today and rebase autocomplete on main.

I had a quick glance and all looked good but have reduced availability until 12th and will not be able to test locally until then so I would say feel free to merge and we can iterate afterwards.

dlqqq · 2024-01-05T00:50:30Z

@krassowski NP, thanks for taking the time to let me know! I don't think there's any urgency for now, since this branch probably shouldn't be released until JupyterLab 4.1.0 is released.

* do not import from pydantic directly * refactor inline completion backend

krassowski · 2024-01-15T14:28:46Z

shouldn't be released until JupyterLab 4.1.0 is released

On the other hand, publishing a pre-release (alpha or beta) would allow users to test it with the 4.1.0 beta/rc. Let me know if there is anything I can help with.

* do not import from pydantic directly * refactor inline completion backend

* Inline code completions (#465) * Draft inline completions implementation (server side) * Implement inline completion provider (front) * Add default debounce delay and error handling (front) * Add `gpt-3.5-turbo-instruct` because text- models are deprecated. OpenAI specifically recommends using `gpt-3.5-turbo-instruct` in favour of text-davinci, text-ada, etc. See: https://platform.openai.com/docs/deprecations/ * Improve/fix prompt template and add simple post-processing * Handle missing `registerInlineProvider`, handle no model in name * Remove IPython mention to avoid confusing languages * Disable suggestions in markdown, move language logic * Remove unused background and clip path from jupyternaut * Implement toggling the AI completer via statusbar item also adds the icon for provider re-using jupyternaut icon * Implement streaming support * Translate ipython to python for models, remove log * Move `BaseLLMHandler` to `/completions` rename to `LLMHandlerMixin` * Move frontend completions code to `/completions` * Make `IStatusBar` required for now, lint * Simplify inline completion backend (#553) * do not import from pydantic directly * refactor inline completion backend * Autocomplete frontend fixes (#583) * remove duplicate definition of inline completion provider * rename completion variables, plugins, token to be more accurate * abbreviate JupyterAIInlineProvider => JaiInlineProvider * bump @jupyterlab/completer and typescript * WIP: fix Jupyter AI completion settings * Fix issues with settings population * read from settings directly instead of using a cache * disable Jupyter AI completion by default * improve completion plugin menu items * revert unnecessary edits to package manifest * Update packages/jupyter-ai/src/components/statusbar-item.tsx Co-authored-by: Michał Krassowski <[email protected]> * tweak wording --------- Co-authored-by: krassowski <[email protected]> --------- Co-authored-by: David L. Qiu <[email protected]>

* Inline code completions (jupyterlab#465) * Draft inline completions implementation (server side) * Implement inline completion provider (front) * Add default debounce delay and error handling (front) * Add `gpt-3.5-turbo-instruct` because text- models are deprecated. OpenAI specifically recommends using `gpt-3.5-turbo-instruct` in favour of text-davinci, text-ada, etc. See: https://platform.openai.com/docs/deprecations/ * Improve/fix prompt template and add simple post-processing * Handle missing `registerInlineProvider`, handle no model in name * Remove IPython mention to avoid confusing languages * Disable suggestions in markdown, move language logic * Remove unused background and clip path from jupyternaut * Implement toggling the AI completer via statusbar item also adds the icon for provider re-using jupyternaut icon * Implement streaming support * Translate ipython to python for models, remove log * Move `BaseLLMHandler` to `/completions` rename to `LLMHandlerMixin` * Move frontend completions code to `/completions` * Make `IStatusBar` required for now, lint * Simplify inline completion backend (jupyterlab#553) * do not import from pydantic directly * refactor inline completion backend * Autocomplete frontend fixes (jupyterlab#583) * remove duplicate definition of inline completion provider * rename completion variables, plugins, token to be more accurate * abbreviate JupyterAIInlineProvider => JaiInlineProvider * bump @jupyterlab/completer and typescript * WIP: fix Jupyter AI completion settings * Fix issues with settings population * read from settings directly instead of using a cache * disable Jupyter AI completion by default * improve completion plugin menu items * revert unnecessary edits to package manifest * Update packages/jupyter-ai/src/components/statusbar-item.tsx Co-authored-by: Michał Krassowski <[email protected]> * tweak wording --------- Co-authored-by: krassowski <[email protected]> --------- Co-authored-by: David L. Qiu <[email protected]>

dlqqq added 2 commits December 29, 2023 10:50

do not import from pydantic directly

b4358bc

refactor inline completion backend

606e510

dlqqq added the enhancement New feature or request label Dec 30, 2023

dlqqq requested a review from krassowski December 30, 2023 01:36

dlqqq self-assigned this Dec 30, 2023

dlqqq merged commit 1654759 into jupyterlab:autocomplete Jan 5, 2024
9 checks passed

dlqqq added a commit that referenced this pull request Jan 11, 2024

Simplify inline completion backend (#553)

c51df0b

* do not import from pydantic directly * refactor inline completion backend

krassowski mentioned this pull request Jan 15, 2024

Inline completion support #582

Merged

dlqqq added a commit that referenced this pull request Jan 19, 2024

Simplify inline completion backend (#553)

c8713ee

* do not import from pydantic directly * refactor inline completion backend

krassowski mentioned this pull request Jan 21, 2024

Streaming is broken and manual completion is broken #591

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify inline completion backend #553

Simplify inline completion backend #553

dlqqq commented Dec 30, 2023

dlqqq commented Dec 30, 2023

krassowski commented Dec 30, 2023

krassowski commented Dec 30, 2023

dlqqq commented Dec 30, 2023

d5423197 commented Jan 3, 2024

krassowski commented Jan 3, 2024

dlqqq commented Jan 3, 2024

krassowski commented Jan 4, 2024 •

edited

Loading

dlqqq commented Jan 5, 2024

krassowski commented Jan 15, 2024

Simplify inline completion backend #553

Simplify inline completion backend #553

Conversation

dlqqq commented Dec 30, 2023

Summary of changes

Context

dlqqq commented Dec 30, 2023

krassowski commented Dec 30, 2023

krassowski commented Dec 30, 2023

dlqqq commented Dec 30, 2023

d5423197 commented Jan 3, 2024

krassowski commented Jan 3, 2024

dlqqq commented Jan 3, 2024

krassowski commented Jan 4, 2024 • edited Loading

dlqqq commented Jan 5, 2024

krassowski commented Jan 15, 2024

krassowski commented Jan 4, 2024 •

edited

Loading