Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jupyternaut capability to answer questions regarding data #1053

Open
sqlreport opened this issue Oct 24, 2024 · 6 comments
Open

jupyternaut capability to answer questions regarding data #1053

sqlreport opened this issue Oct 24, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@sqlreport
Copy link

Problem

Proposed Solution

@sqlreport sqlreport added the enhancement New feature or request label Oct 24, 2024
@krassowski
Copy link
Member

Independently, one user mentioned to me that they would like to be able to include variables names in their message, and that this is one of the critical feature missing in jupyter-ai to make the UX worth it. It feels like:

  • the chat handler @ should complete variable names (this would need to be fetched from active kernel; there already is a way to do this in the kernel messaging protocol using the same mechanism as tab-completion)
  • information about the variable should be included in the message (jupyter-ai could reuse some of the jupyterlab-variable-inspector code here to get it).

I feel that for jupyter-ai use case the primary usage of @ should be for including variables/files and only secondary for including other users (as this is a less common use-case).

@krassowski
Copy link
Member

@krassowski
Copy link
Member

krassowski commented Oct 24, 2024

For contents of the variable inspect_request (in IPython implemented with pinfo) which is part of the kernel messaging protocol could be used too. Maybe we could even add an argument to tell it that that it is for machine rather than human consumption.

@dlqqq
Copy link
Member

dlqqq commented Oct 29, 2024

@krassowski @sqlreport These are great ideas. Thanks for opening an issue about this. I'm not sure if we should support pandas-ai directly in the jupyter-ai package, since I'm uncertain about its licensing. This integration can be provided by a separate package (e.g. jupyter-ai-pandas), so that seems preferable.

A couple of questions I had while thinking about this:

  1. What do you think a context command implementing this should be called? @var:<variable-name>?

  2. What sort of variable types should we support beyond "data variables" like arrays, dictionaries, and dataframes? Should we also allow for classes & functions to be passed? If so, how do we serialize them to a string?

@krassowski
Copy link
Member

What sort of variable types should we support beyond "data variables" like arrays, dictionaries, and dataframes? Should we also allow for classes & functions to be passed? If so, how do we serialize them to a string?

Regarding (2) I was brainstorming that with a few folks and some thoughts here are:

  • to stringify (or jsonify) a Python object we could look for _repr_llm_ on objects of arbitrary type and call it if it exists; we could fallback to existing _repr_markdown_, _repr_html_ and __repr__ along with _repr_mimebundle_; this follows the existing implementation of represtntations in IPython https://ipython.readthedocs.io/en/stable/config/integrating.html and we could document _repr_llm_ on that page once we are more confident about it's specification (for example, differently to other _repr_ methods it feels like it should allow for an argument controlling verbosity/truncation). This way ultimately it will be the choice of a Python package providing data structure to provide a good default representation (but users will be of course able to override it by monkeypatching, which is a common way of extending e.g. pandas)
  • in Python-agnostic Juypter word: the Contextual Help panel in Jupyter and the tooltip with variable documentation which shows when you press Shift + Tab when in a variable display the information about the variable using Jupyter messaging protocol; these both use inspect_request. Being able to send such a request with an argument indicating that LLM-oriented response should be given would allow the implementation (e.g. IPython) to call _repr_llm_ instead of __repr__ (which it currently calls based on some heuristics).

So my answer would be: it is not up to us (in jupyter-ai) to decide which variable types to support; we should allow to pass any known variable and if the kernel is not able to represent it (e.g. inspect_request returns null because there was an exception in _repr_llm_) then we would just provide the LLM with that error.

@krassowski
Copy link
Member

  1. What do you think a context command implementing this should be called? @var:<variable-name>?

The less user has to type the better. I think that as long as auto-complete works when I type @my_dat I don't care if it substitutes it to @var:my_data_frame or @obj:my_data_frame etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants