Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community]: Image Extraction Fixed for PDFPlumberParser #28491

Merged

Conversation

keenborder786
Copy link
Contributor

Copy link

vercel bot commented Dec 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Dec 18, 2024 4:40pm

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. community Related to langchain-community 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Dec 3, 2024
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Dec 3, 2024
@keenborder786
Copy link
Contributor Author

@ccurme

2 similar comments
@keenborder786
Copy link
Contributor Author

@ccurme

@keenborder786
Copy link
Contributor Author

@ccurme

Copy link
Collaborator

@baskaryan baskaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like @c-xlenz had some feedback worth addressing: #28480 (comment)

@@ -41,6 +41,7 @@ dataclasses-json = ">= 0.5.7, < 0.7"
pydantic-settings = "^2.4.0"
langsmith = "^0.1.125"
httpx-sse = "^0.4.0"
pillow = "^11.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be adding any new required dependencies — is there any other way to fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan The way I see it no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan let me know what I can do then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this fix, I need pillow otherwise it won't work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keenborder786, optional dependencies can be OK, but not required ones

Also please get in the habit of including tests with PRs. We want to avoid a situation where a bug fix, is introducing new bug fixes (e.g., #28480 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyurtsev okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyurtsev @baskaryan I have removed pillow as a required dependency. Please check now.

@@ -427,6 +427,13 @@ def __init__(
text_kwargs: Keyword arguments to pass to ``pdfplumber.Page.extract_text()``
dedupe: Avoiding the error of duplicate characters if `dedupe=True`.
"""
try:
import PIL # noqa:F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK (will not raise new errors for users) because Pillow is already a dependency of pdfplumber.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 18, 2024
@ccurme ccurme merged commit d49df48 into langchain-ai:master Dec 18, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature community Related to langchain-community lgtm PR looks good. Use to confirm that a PR is ready for merging. size:S This PR changes 10-29 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants