Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/learn error with non-UTF-8 files #377

Closed
scottdonnelly opened this issue Sep 3, 2023 · 10 comments
Closed

/learn error with non-UTF-8 files #377

scottdonnelly opened this issue Sep 3, 2023 · 10 comments
Assignees
Labels
bug Something isn't working @jupyter-ai/chatui

Comments

@scottdonnelly
Copy link

scottdonnelly commented Sep 3, 2023

Hi,

When I run /learn docs/ I get the message:

Sorry, that path doesn't exist: C:\Users\sdonn\docs/

That is not the working directory I'm using in Jupyter, so it appears to be in the wrong directory? Do I need to enter the working directory somehow? Tried this a few ways but to no avail... Thanks.

Edit: AI magics and chat are working perfectly fine.

@scottdonnelly scottdonnelly added the bug Something isn't working label Sep 3, 2023
@welcome
Copy link

welcome bot commented Sep 3, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@JasonWeill
Copy link
Collaborator

@scottdonnelly Thanks for your contribution! Jupyter AI's /learn command is relative to the directory in which you started JupyterLab. Is that C:\Users\sdonn for you?

@scottdonnelly
Copy link
Author

Hi, thanks! I tried starting one level above in the target directory with:

/learn target_directory/

And I got this error?

Traceback (most recent call last): File "C:\Users\sdonn\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\jupyter_ai\chat_handlers\base.py", line 38, in process_message await self._process_message(message) File "C:\Users\sdonn\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\jupyter_ai\chat_handlers\learn.py", line 119, in _process_message await self.learn_dir(load_path, args.chunk_size, args.chunk_overlap) File "C:\Users\sdonn\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\jupyter_ai\chat_handlers\learn.py", line 151, in learn_dir doc_chunks = await dask_client.compute(delayed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\sdonn\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\distributed\client.py", line 339, in _result raise exc.with_traceback(tb) File "C:\Users\sdonn\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\jupyter_ai\document_loaders\directory.py", line 14, in path_to_doc text = f.read() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] ^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 236: character maps to <undefined>

@JasonWeill
Copy link
Collaborator

Interesting: we seem to be assuming that all files that a user learns with /learn are entirely in UTF-8, even though cp1252.py is for code page 1252.

@JasonWeill JasonWeill changed the title /learn in the wrong directory? /learn error with non-UTF-8 files Sep 5, 2023
@scottdonnelly
Copy link
Author

Any workarounds to get /learn to work here you think?

@JasonWeill
Copy link
Collaborator

Until this is fixed, you'll need to /learn a list of files that is only in UTF-8 format. Thanks again for opening this issue!

@JasonWeill JasonWeill self-assigned this Oct 26, 2023
@JasonWeill
Copy link
Collaborator

I tried this using Jupyter AI 2.4.0 (tip of main) on macOS, and I didn't see this error.

@JasonWeill
Copy link
Collaborator

I also don't see this error on Linux, so it may be limited to Windows. (Still working on testing it there.)

@3coins mentioned that it might be worth switching to Langchain document loaders, which have a method to detect file encodings.

@JasonWeill
Copy link
Collaborator

After updating my Windows machine with the newest versions of JupyterLab and Jupyter AI, I also don't experience this problem with /learn and the files you described (all files under https://github.com/python/cpython/blob/main/Lib/encodings/). @scottdonnelly Can you update JupyterLab and Jupyter AI to the newest versions and please try again?

@JasonWeill
Copy link
Collaborator

Closing as not reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working @jupyter-ai/chatui
Projects
None yet
Development

No branches or pull requests

2 participants