jupyternaut capability to answer questions regarding data #1053

sqlreport · 2024-10-24T14:30:09Z

Problem

I would like to load data source and use jupyternaut to ask questions regarding data
I would like to experiment using https://github.com/Sinaptik-AI/pandas-ai/tree/main/pandasai library and corporate that into jupyternaut interaction

Proposed Solution

new chat handler that accepts data source parameters
custom slash command example demonstrating capability https://jupyter-ai.readthedocs.io/en/latest/developers/index.html#custom-slash-commands-in-the-chat-ui

krassowski · 2024-10-24T14:36:10Z

Independently, one user mentioned to me that they would like to be able to include variables names in their message, and that this is one of the critical feature missing in jupyter-ai to make the UX worth it. It feels like:

the chat handler @ should complete variable names (this would need to be fetched from active kernel; there already is a way to do this in the kernel messaging protocol using the same mechanism as tab-completion)
information about the variable should be included in the message (jupyter-ai could reuse some of the jupyterlab-variable-inspector code here to get it).

I feel that for jupyter-ai use case the primary usage of @ should be for including variables/files and only secondary for including other users (as this is a less common use-case).

krassowski · 2024-10-24T14:37:45Z

Related to:

krassowski · 2024-10-24T14:46:40Z

For contents of the variable inspect_request (in IPython implemented with pinfo) which is part of the kernel messaging protocol could be used too. Maybe we could even add an argument to tell it that that it is for machine rather than human consumption.

dlqqq · 2024-10-29T17:52:14Z

@krassowski @sqlreport These are great ideas. Thanks for opening an issue about this. I'm not sure if we should support pandas-ai directly in the jupyter-ai package, since I'm uncertain about its licensing. This integration can be provided by a separate package (e.g. jupyter-ai-pandas), so that seems preferable.

A couple of questions I had while thinking about this:

What do you think a context command implementing this should be called? @var:<variable-name>?
What sort of variable types should we support beyond "data variables" like arrays, dictionaries, and dataframes? Should we also allow for classes & functions to be passed? If so, how do we serialize them to a string?

krassowski · 2024-10-30T09:39:35Z

What sort of variable types should we support beyond "data variables" like arrays, dictionaries, and dataframes? Should we also allow for classes & functions to be passed? If so, how do we serialize them to a string?

Regarding (2) I was brainstorming that with a few folks and some thoughts here are:

to stringify (or jsonify) a Python object we could look for _repr_llm_ on objects of arbitrary type and call it if it exists; we could fallback to existing _repr_markdown_, _repr_html_ and __repr__ along with _repr_mimebundle_; this follows the existing implementation of represtntations in IPython https://ipython.readthedocs.io/en/stable/config/integrating.html and we could document _repr_llm_ on that page once we are more confident about it's specification (for example, differently to other _repr_ methods it feels like it should allow for an argument controlling verbosity/truncation). This way ultimately it will be the choice of a Python package providing data structure to provide a good default representation (but users will be of course able to override it by monkeypatching, which is a common way of extending e.g. pandas)
in Python-agnostic Juypter word: the Contextual Help panel in Jupyter and the tooltip with variable documentation which shows when you press Shift + Tab when in a variable display the information about the variable using Jupyter messaging protocol; these both use inspect_request. Being able to send such a request with an argument indicating that LLM-oriented response should be given would allow the implementation (e.g. IPython) to call _repr_llm_ instead of __repr__ (which it currently calls based on some heuristics).

So my answer would be: it is not up to us (in jupyter-ai) to decide which variable types to support; we should allow to pass any known variable and if the kernel is not able to represent it (e.g. inspect_request returns null because there was an exception in _repr_llm_) then we would just provide the LLM with that error.

krassowski · 2024-10-30T09:40:52Z

What do you think a context command implementing this should be called? @var:<variable-name>?

The less user has to type the better. I think that as long as auto-complete works when I type @my_dat I don't care if it substitutes it to @var:my_data_frame or @obj:my_data_frame etc.

sqlreport added the enhancement New feature or request label Oct 24, 2024

krassowski mentioned this issue Dec 11, 2024

Pre-proposal: standardize object representations for ai and a protocol to retrieve them jupyter/enhancement-proposals#128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jupyternaut capability to answer questions regarding data #1053

jupyternaut capability to answer questions regarding data #1053

sqlreport commented Oct 24, 2024

krassowski commented Oct 24, 2024

krassowski commented Oct 24, 2024

krassowski commented Oct 24, 2024 •

edited

Loading

dlqqq commented Oct 29, 2024 •

edited

Loading

krassowski commented Oct 30, 2024

krassowski commented Oct 30, 2024

jupyternaut capability to answer questions regarding data #1053

jupyternaut capability to answer questions regarding data #1053

Comments

sqlreport commented Oct 24, 2024

Problem

Proposed Solution

krassowski commented Oct 24, 2024

krassowski commented Oct 24, 2024

krassowski commented Oct 24, 2024 • edited Loading

dlqqq commented Oct 29, 2024 • edited Loading

krassowski commented Oct 30, 2024

krassowski commented Oct 30, 2024

krassowski commented Oct 24, 2024 •

edited

Loading

dlqqq commented Oct 29, 2024 •

edited

Loading