-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Non-api version of Unstrctured file converter #258
Conversation
Hey! If we want to support this option, I'm not sure about shipping it in the same package... 🤔 |
Hey @anakin87 thanks for the feedback! I agree that the full installation of Unstructured looks tricky, but I think it would still be worthwhile to support it for a lesser set of file types or let users optionally install the different dependencies locally. It's just in some settings (like in dC) it's more difficult to spin up an additional (large) docker image than having a local install. |
|
Okay this sounds reasonable to me. Would I need to create a new folder called something like |
@sjrl I'll let you know! Sorry for the delay... |
Hey, @sjrl!
Dependencies
Please let me know if you have any doubts or need clarification! |
Closing for now, may revisit in the future when I have more time. |
I added a version of the Unstructured File Converter that runs without the docker image.
I believe it is nice to provide a choice to the user if they would like to use a hosted version of Unstructured versus non-hosted in case their environment does not allow them to start an additional hosted service through Docker.
The change to make this work is very minimal but does lead to some questions for me about best practices with integrations.
Questions:
pip install unstructured
topip install unstructured[pdf]
. Is it all right to require this new dependency or would it be better to follow something like Haystack's dependency management where the[pdf]
part is optional using something like LazyImport?UnstructuredFileConverter
toUnstructuredAPIFileConverter
to make it clearer that it uses the hosted version of Unstructured? Then the newLocal
version could be namedUnstructuredFileConverter
.Additional Comments