Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running /learn on a directory, capture and skip files that couldn't be learned #423

Open
JasonWeill opened this issue Nov 1, 2023 · 1 comment
Labels
enhancement New feature or request @jupyter-ai/chatui project:learn-ask /learn and /ask commands, discoverability

Comments

@JasonWeill
Copy link
Collaborator

Problem

When the user chooses to learn a folder, and at least one item in the folder cannot be read, we fail without indicating which file(s) cannot be learned. This is a bad user experience.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 20, in parse_json
    nb_dict = json.loads(s, **kwargs)
  File "/opt/conda/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/conda/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 16 column 2 (char 16)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 39, in process_message
    await self._process_message(message)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/learn.py", line 118, in _process_message
    await self.learn_dir(load_path, args.chunk_size, args.chunk_overlap)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/learn.py", line 150, in learn_dir
    doc_chunks = await dask_client.compute(delayed)
  File "/opt/conda/lib/python3.10/site-packages/distributed/client.py", line 330, in _result
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/directory.py", line 46, in split_document
    return splitter.split_documents([document])
  File "/opt/conda/lib/python3.10/site-packages/langchain/text_splitter.py", line 161, in split_documents
    return self.create_documents(texts, metadatas=metadatas)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 31, in create_documents
    for chunk in self.split_text(text, metadata):
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 22, in split_text
    return splitter.split_text(text)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 48, in split_text
    nb = nbformat.reads(text, as_version=4)
  File "/opt/conda/lib/python3.10/site-packages/nbformat/__init__.py", line 89, in reads
    nb = reader.reads(s, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 76, in reads
    nb_dict = parse_json(s, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 26, in parse_json
    raise NotJSONError(message) from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Here is ...

Proposed Solution

When a single file fails to be embedded, try other files.

Produce output indicating which file(s) failed.

Capture errors for affected files in a log file. Make the log file available in the root directory and have Jupyternaut tell the user about it.

Additional context

See #422 for a similar issue concerning the /generate command.

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Nov 6, 2023

Please consider using this on JupyterHub too. We are on a JupyterHub running on Kubernetes. The logs are visible in pods or via external monitoring solution only. Many users don't know how to access them.

Thus please provide progress and error messages in the UI.

Right now for example I've asked it to learn the docs of HoloViz Panel. But nothing happens. And I can't get any progress or error information from the UI.

Congrats on jupyter-ai its already amazing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request @jupyter-ai/chatui project:learn-ask /learn and /ask commands, discoverability
Projects
None yet
Development

No branches or pull requests

2 participants