Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle single files, pdfs, errors from missing loader dependencies in /learn #733

Merged

Conversation

srdas
Copy link
Collaborator

@srdas srdas commented Apr 16, 2024

Improves on PR #712

This PR makes the following enhancements:

  1. Enables handling single files, not just directories.
  2. Learns PDFs with Langchain's PyPDFLoader. In PR Handle Single Files and also enable html, pdf file formats for /learn #712, the pypdf package was directly used instead of PyPDFLoader, which also depends on pypdf.
  3. In the earlier PR, pypdf was added as a required dependency, this has been changed to make it an optional dependency in [all].
  4. If pypdf is not installed, the full missing package error with traceback is displayed, which is poor UX. Modified learn.py to return a clean error with the missing package name, w/o traceback. The error handling for a missing dependency is generic and does not depend on specific packages, and is displayed when the file type that is being handled needs additional packages.

Here is an example of the error when pypdf is not installed.
image

Here is an example of handling a single PDF file after pypdf is installed.
image

srdas and others added 2 commits April 16, 2024 02:07
(1) Enables handling single files, not just directories.
(2) Learns PDFs with langchain's PyPDFLoader.
(3) Gives a clean error w/o traceback when the file type that is being handled needs addtional packages.
@srdas srdas added the enhancement New feature or request label Apr 16, 2024
srdas added 2 commits April 16, 2024 12:59
Removed the extra attribute and additional response comments based on feedback from Piyush Jain and Andrii Ieroshenko
Made the error message more generic as there are many different failure types.
@3coins
Copy link
Collaborator

3coins commented Apr 16, 2024

@srdas
Make sure your pre-commit setup is complete, it seems like there is a linting error. This will ensure that linting runs before you push your changes.

pip install pre-commit
pre-commit install

@3coins 3coins marked this pull request as ready for review April 16, 2024 21:49
@3coins 3coins merged commit e27293c into jupyterlab:main Apr 16, 2024
8 checks passed
@srdas
Copy link
Collaborator Author

srdas commented Apr 16, 2024

Thanks @3coins for your support.

@dlqqq
Copy link
Member

dlqqq commented Apr 25, 2024

@meeseeksdev please backport to 1.x

Copy link

lumberbot-app bot commented Apr 25, 2024

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 1.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 e27293cf86fa37b2ea20de6b481ad989e209933a
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #733: Handle single files, pdfs, errors from missing loader dependencies in `/learn`'
  1. Push to a named branch:
git push YOURFORK 1.x:auto-backport-of-pr-733-on-1.x
  1. Create a PR against branch 1.x, I would have named this PR:

"Backport PR #733 on branch 1.x (Handle single files, pdfs, errors from missing loader dependencies in /learn)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

dlqqq pushed a commit to dlqqq/jupyter-ai that referenced this pull request Apr 25, 2024
dlqqq added a commit that referenced this pull request Apr 25, 2024
…er dependencies in `/learn` (#744)

Co-authored-by: Sanjiv Das <[email protected]>
Marchlak pushed a commit to Marchlak/jupyter-ai that referenced this pull request Oct 28, 2024
… `/learn` (jupyterlab#733)

* Handle single files, pdfs, errors

(1) Enables handling single files, not just directories.
(2) Learns PDFs with langchain's PyPDFLoader.
(3) Gives a clean error w/o traceback when the file type that is being handled needs addtional packages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* error handling for missing packages in learn.py

Removed the extra attribute and additional response comments based on feedback from Piyush Jain and Andrii Ieroshenko

* Amend error message for failure in learn.py

Made the error message more generic as there are many different failure types.

* Fixed build error.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piyush Jain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants