-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: Add @mozilla/readability
document transformer
#27604
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
e86d5d2
to
9700c11
Compare
d5bb2de
to
f9b2004
Compare
f9b2004
to
f565e03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @CNSeniorious000.
It's unclear to me that the demand for this is high enough to justify the additional maintenance burden.
Would you be interested in publishing an OSS integration package (e.g., langchain-readability
or similar)? We've written a walkthrough on this process here:
https://python.langchain.com/docs/contributing/how_to/integrations/
We are encouraging contributors of LangChain integrations to go this route. This way we don't have to be in the loop for reviews, you're able to properly integration test the package, and you have control over versioning.
Docs would continue to be maintained in the langchain
repo.
Let me know what you think!
Description
langchain-js
already has a useful document transformer that use@mozilla/readability
to extract main content of a web page heuristically. [docs] [source]This PR introduces a new
ReadabilityTransformer
class to thelangchain_community/document_transformers
, which class leverages thepython-readability
library to do the same thing.Dependencies:
python-readability
— a Standalone Python wrapper for@mozilla/readability
Mention that no nodejs environment is needed. In regular CPython distributions, python-readability requires PythonMonkey to interpret JavaScript, and in Pyodide, it uses the native JavaScript environment. So this package is available even if the user deploys langchain apps on Cloudflare Workers.