forked from jupyterlab/jupyter-ai
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor split function with tests (jupyterlab#811)
* Refactor split function with test The split function was (1) selecting files in included directories in the top half of the function, and (2) selecting files with valid extensions and sharding them in the second half. This PR divides the split function in a new `collect_files` function that selects files with valid extensions from non-excluded directories, and then passes the valid filepaths into the `split` function, which calls `collect_files`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test changed to use pytest * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor split function with test The split function was (1) selecting files in included directories in the top half of the function, and (2) selecting files with valid extensions and sharding them in the second half. This PR divides the split function in a new `collect_files` function that selects files with valid extensions from non-excluded directories, and then passes the valid filepaths into the `split` function, which calls `collect_files`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test changed to use pytest * refactored tests for directory.py using pytest fixtures Replaced testing using unittests with testing using pytest fixtures. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove old test files replacd unittests with pytests * Update test_directory.py * Update docstrings and further improve code for retrieve filepaths and split Further improvements to the code suggested from the review of PR * update docstring in test file * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update directory.py Changed function level constant from all caps to lower case to line up with the convention in https://peps.python.org/pep-0008/#constants. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
ffffb15
commit f138b0d
Showing
11 changed files
with
110 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Hidden temp text file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head><meta charset="utf-8" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
<title>Notebook</title> | ||
</head> | ||
<body> | ||
<div>This is the notebook content</div> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This is a temp test text file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
import os | ||
|
||
print("Hello World") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Column1, Column2 | ||
Test1, test2 |
Empty file.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
import os | ||
import shutil | ||
from pathlib import Path | ||
from typing import Tuple | ||
|
||
import pytest | ||
from jupyter_ai.document_loaders.directory import collect_filepaths | ||
|
||
|
||
@pytest.fixture | ||
def staging_dir(static_test_files_dir, jp_ai_staging_dir) -> Path: | ||
file1_path = static_test_files_dir / ".hidden_file.pdf" | ||
file2_path = static_test_files_dir / ".hidden_file.txt" | ||
file3_path = static_test_files_dir / "file0.html" | ||
file4_path = static_test_files_dir / "file1.txt" | ||
file5_path = static_test_files_dir / "file2.py" | ||
file6_path = static_test_files_dir / "file3.csv" | ||
file7_path = static_test_files_dir / "file3.xyz" | ||
file8_path = static_test_files_dir / "file4.pdf" | ||
|
||
job_staging_dir = jp_ai_staging_dir / "TestDir" | ||
job_staging_dir.mkdir() | ||
job_staging_subdir = job_staging_dir / "subdir" | ||
job_staging_subdir.mkdir() | ||
job_staging_hiddendir = job_staging_dir / ".hidden_dir" | ||
job_staging_hiddendir.mkdir() | ||
|
||
shutil.copy2(file1_path, job_staging_dir) | ||
shutil.copy2(file2_path, job_staging_subdir) | ||
shutil.copy2(file3_path, job_staging_dir) | ||
shutil.copy2(file4_path, job_staging_subdir) | ||
shutil.copy2(file5_path, job_staging_subdir) | ||
shutil.copy2(file6_path, job_staging_hiddendir) | ||
shutil.copy2(file7_path, job_staging_subdir) | ||
shutil.copy2(file8_path, job_staging_hiddendir) | ||
|
||
return job_staging_dir | ||
|
||
|
||
def test_collect_filepaths(staging_dir): | ||
""" | ||
Test that the number of valid files for `/learn` is correct. | ||
i.e., the `collect_filepaths` function only selects files that are | ||
1. Not in the the excluded directories and | ||
2. Are in the valid file extensions list. | ||
""" | ||
all_files = False | ||
staging_dir_filepath = staging_dir | ||
# Call the function we want to test | ||
result = collect_filepaths(staging_dir_filepath, all_files) | ||
|
||
assert len(result) == 3 # Test number of valid files | ||
|
||
filenames = [fp.name for fp in result] | ||
assert "file0.html" in filenames # Check that valid file is included | ||
assert "file3.xyz" not in filenames # Check that invalid file is excluded |