Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Single Files and also enable html, pdf file formats for /learn #712

Merged
merged 12 commits into from
Apr 9, 2024

Conversation

srdas
Copy link
Collaborator

@srdas srdas commented Apr 2, 2024

Fixes #641

Fixes #358

Amends PR #663, which may be closed after giving credit to apurvakhatri.

Update directory.py to add new file formats

  1. Added single file functionality
  2. Added PDF files
  3. Added HTML files

For a recent working paper (37 pages pdf) that is not public see below that it works in /learn as a single file for in-context prompting. This exemplifies 1 and 2 above.
image

On a raw html page with my publications (https://srdas.github.io/research.htm) saved as a .html file, it delivers a nice summary shown below, exemplifying 3 above. Note that here /learn was applied to an entire folder containing txt, pdf, and html files.
image

1.  Added single file functionality
2.  Added HTML files
3.  Added PDF files
@srdas srdas added bug Something isn't working enhancement New feature or request labels Apr 2, 2024
pre-commit-ci bot and others added 3 commits April 2, 2024 20:46
Added pypdf==4.1.0, required for handling pdf files in /learn
@srdas srdas requested a review from JasonWeill April 8, 2024 20:13
srdas and others added 3 commits April 8, 2024 13:55
1.  Added single file functionality
2.  Added HTML files
3.  Added PDF files
Added pypdf==4.1.0, required for handling pdf files in /learn
@JasonWeill JasonWeill force-pushed the learn_more_file_formats branch from f045924 to 6afc81d Compare April 8, 2024 20:55
Copy link
Collaborator

@JasonWeill JasonWeill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good! Thanks for demo'ing your change for me as well. Just a couple of suggestions for improvement.

srdas and others added 5 commits April 8, 2024 15:30
Made changes for
1. matching all file extensions in lower case, to ensure no case sensitivity
2. Streamlined the PDF loader to remove a loop over pages, using join and list comprehension.
@srdas srdas merged commit c12b7fd into jupyterlab:main Apr 9, 2024
8 checks passed
@srdas srdas deleted the learn_more_file_formats branch April 9, 2024 20:49
@srdas
Copy link
Collaborator Author

srdas commented Apr 10, 2024

@meeseeksdev please backport to 1.x

meeseeksmachine pushed a commit to meeseeksmachine/jupyter-ai that referenced this pull request Apr 10, 2024
srdas added a commit that referenced this pull request Apr 10, 2024
Marchlak pushed a commit to Marchlak/jupyter-ai that referenced this pull request Oct 28, 2024
…jupyterlab#712)

* Update directory.py to add new file formats

1.  Added single file functionality
2.  Added HTML files
3.  Added PDF files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dependencies

Added pypdf==4.1.0, required for handling pdf files in /learn

* Update directory.py to add new file formats

1.  Added single file functionality
2.  Added HTML files
3.  Added PDF files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dependencies

Added pypdf==4.1.0, required for handling pdf files in /learn

* Amended directory.py

Made changes for
1. matching all file extensions in lower case, to ensure no case sensitivity
2. Streamlined the PDF loader to remove a loop over pages, using join and list comprehension.

* Update directory.py

* Update directory.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
2 participants