-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: fallback on core async atransform_documents method for MarkdownifyTransformer
#27866
Merged
efriis
merged 14 commits into
langchain-ai:master
from
rparkr:rparkr/async-markdownify-transform
Dec 13, 2024
Merged
community: fallback on core async atransform_documents method for MarkdownifyTransformer
#27866
efriis
merged 14 commits into
langchain-ai:master
from
rparkr:rparkr/async-markdownify-transform
Dec 13, 2024
+140
−9
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add asynchronous method for transforming a single document (`_atransform_document`) - Implement the asynchronous method for transforming a list of documents using async.gather
- Remove extra function definition; replace with asyncio.create_task()
dosubot
bot
added
the
size:M
This PR changes 30-99 lines, ignoring generated files.
label
Nov 3, 2024
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
- Split the list comprehension across lines for better readability
- Fix synchronous unit tests for MarkdownifyTransformer - Add asynchronous unit tests for MarkdownifyTransformer (duplicates of the synchronous tests)
- The unit tests passed in the dev container which was created in GitHub Codespaces using the devcontainer.json file from the Langchain repo - However, the unit tests were failing in CI - I updated the failed unit tests and will continue to debug using the CI checks
- A few unit tests continued to fail in CI, although they passed with `make test` in the dev container environment - I updated the unit tests to fix the CI failures
- Fix more unit tests failures related to trailing spaces
dosubot
bot
added
size:L
This PR changes 100-499 lines, ignoring generated files.
and removed
size:M
This PR changes 30-99 lines, ignoring generated files.
labels
Dec 4, 2024
efriis
reviewed
Dec 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this isn't for performance on a cpu bound operation, I'll probably just delete the NotImplementedError
stub and fallback on the default atransform_documents
behavior for api completeness!
efriis
changed the title
community: implement the
community: fallback on core async atransform_documents method for Dec 13, 2024
atransform_documents
method for MarkdownifyTransformer
MarkdownifyTransformer
efriis
approved these changes
Dec 13, 2024
dosubot
bot
added
the
lgtm
PR looks good. Use to confirm that a PR is ready for merging.
label
Dec 13, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Implements the
atransform_documents
method forMarkdownifyTransformer
using theasyncio
built-in library for concurrency.Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the
markdownify
function is not I/O bound because it works withDocument
objects already in memory.Issue
Fixes #27865
Dependencies
No new dependencies added, but
markdownify
is required since this PR updates themarkdownify
integration.Tests and docs
Lint and test
I ran formatting with
make format
, linting withmake lint
, and confirmed that tests pass usingmake test
. Note that some unit tests pass in CI but may fail when runningmake_test
. Those unit tests are:test_extract_html
(andtest_extract_html_async
)test_strip_tags
(andtest_strip_tags_async
)test_convert_tags
(andtest_convert_tags_async
)The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with
make test
. I ensured that the tests pass in CI, but they may fail withmake test
due to the addition of trailing spaces.