Support added for a single file and directory #663

apurvakhatri · 2024-03-01T21:30:07Z

Hello Team,

Issue number of the reported bug or feature request: #641

Changes
Support added to QA a single file. The code is capable to detect single file or directory and process accordingly.

Testing performed
The functionality has been tested on my system (MacOS).

Additional context
I have tested for single files, file inside a directory and directory only.

for more information, see https://pre-commit.ci

dlqqq

Thank you for this PR! I left some feedback for you to address. Let me know if you have further questions. 🤗

dlqqq · 2024-03-05T22:17:58Z

packages/jupyter-ai/jupyter_ai/document_loaders/directory.py

+                subdirs[:] = [
+                    d for d in subdirs if not (d[0] == "." or d in EXCLUDE_DIRS)
+                ]
+                filenames = [f for f in filenames if not f[0] == "."]


There are a few issues with the implementation proposed by this branch:

We seek a list of file paths relative to the current directory. However, this branch only adds file names.

This branch updates filenames using the assignment operator = instead of .append(), meaning that the list of filenames is dropped with each iteration of the for loop.

filenames is also being used by the for block itself. This means that even if the previous issue is fixed, every iteration of this for loop will still delete the value of filenames set by the previous iteration. Take this as a simplified example:

>>> for i in range(5): ... print(i) ... i = 1 ... 0 1 2 3 4

This implementation can be corrected and simplified greatly. Here are my suggestions.

The logic within the for filename in filenames: ... block on line 69 should be extracted to a separate split_file(path, splitter) function.

Revert the other changes, and simply add this block at the very top of this split() function definition:

if os.path.isfile(path): return split_file(path, splitter)

I checked all this info by adding print() statements in the definition of split() to verify the value of filenames. To test, I ran jupyter lab from the root of this Git repo and called /learn docs to learn all of the Jupyter AI documentation.

Can you do the same before I review this again? Thanks in advance!

@dlqqq The code for the function split() can be simplified to the following form of the original function:

I have tested this separately and it works for a single file or a directory.

dlqqq · 2024-04-15T21:00:30Z

Superseded by #712.

apurvakhatri and others added 2 commits March 1, 2024 16:24

Modified split to support input file and directory

fee0ff9

[pre-commit.ci] auto fixes from pre-commit.com hooks

2ef5839

for more information, see https://pre-commit.ci

dlqqq requested changes Mar 5, 2024

View reviewed changes

dlqqq added the bug Something isn't working label Mar 5, 2024

srdas mentioned this pull request Apr 5, 2024

Handle Single Files and also enable html, pdf file formats for /learn #712

Merged

dlqqq closed this Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support added for a single file and directory #663

Support added for a single file and directory #663

apurvakhatri commented Mar 1, 2024

dlqqq left a comment

dlqqq Mar 5, 2024

dlqqq Mar 5, 2024 •

edited

Loading

srdas Apr 1, 2024

dlqqq commented Apr 15, 2024

Support added for a single file and directory #663

Support added for a single file and directory #663

Conversation

apurvakhatri commented Mar 1, 2024

dlqqq left a comment

Choose a reason for hiding this comment

dlqqq Mar 5, 2024

Choose a reason for hiding this comment

dlqqq Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

srdas Apr 1, 2024

Choose a reason for hiding this comment

dlqqq commented Apr 15, 2024

dlqqq Mar 5, 2024 •

edited

Loading