Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstructured Converter: "no module named haystack.preview" #218

Closed
lambda-science opened this issue Jan 16, 2024 · 6 comments · Fixed by #221
Closed

Unstructured Converter: "no module named haystack.preview" #218

lambda-science opened this issue Jan 16, 2024 · 6 comments · Fixed by #221
Assignees
Labels
bug Something isn't working

Comments

@lambda-science
Copy link
Contributor

lambda-science commented Jan 16, 2024

Describe the bug
Importing the Unstructured component in a pipeline in Haystack 2.0 leads to an error:

Process SpawnProcess-1:
2024-01-16T13:03:33.363987236Z Traceback (most recent call last):
2024-01-16T13:03:33.364332474Z   File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
2024-01-16T13:03:33.364404992Z     self.run()
2024-01-16T13:03:33.364416755Z   File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
2024-01-16T13:03:33.364420612Z     self._target(*self._args, **self._kwargs)
2024-01-16T13:03:33.364423708Z   File "/usr/local/lib/python3.11/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
2024-01-16T13:03:33.364426834Z     target(sockets=sockets)
2024-01-16T13:03:33.364429659Z   File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
2024-01-16T13:03:33.364432845Z     return asyncio.run(self.serve(sockets=sockets))
2024-01-16T13:03:33.364435851Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364438907Z   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
2024-01-16T13:03:33.364441592Z     return runner.run(main)
2024-01-16T13:03:33.364444377Z            ^^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364447052Z   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
2024-01-16T13:03:33.364450098Z     return self._loop.run_until_complete(task)
2024-01-16T13:03:33.364453505Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364456621Z   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
2024-01-16T13:03:33.364459757Z     return future.result()
2024-01-16T13:03:33.364462712Z            ^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364465818Z   File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
2024-01-16T13:03:33.364472090Z     config.load()
2024-01-16T13:03:33.364486247Z   File "/usr/local/lib/python3.11/site-packages/uvicorn/config.py", line 467, in load
2024-01-16T13:03:33.364490174Z     self.loaded_app = import_from_string(self.app)
2024-01-16T13:03:33.364493270Z                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364496296Z   File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 24, in import_from_string
2024-01-16T13:03:33.364500013Z     raise exc from None
2024-01-16T13:03:33.364503620Z   File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 21, in import_from_string
2024-01-16T13:03:33.364506696Z     module = importlib.import_module(module_str)
2024-01-16T13:03:33.364510022Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364555559Z   File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
2024-01-16T13:03:33.364561951Z     return _bootstrap._gcd_import(name[level:], package, level)
2024-01-16T13:03:33.364564306Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-16T13:03:33.364566530Z   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
2024-01-16T13:03:33.364569285Z   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
2024-01-16T13:03:33.364571620Z   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
2024-01-16T13:03:33.364577601Z   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
2024-01-16T13:03:33.364580076Z   File "<frozen importlib._bootstrap_external>", line 940, in exec_module
2024-01-16T13:03:33.364582360Z   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2024-01-16T13:03:33.364584664Z   File "/code/rest_api/application.py", line 4, in <module>
2024-01-16T13:03:33.364586979Z     from rest_api.utils import get_app, get_pipelines
2024-01-16T13:03:33.364589544Z   File "/code/rest_api/utils.py", line 7, in <module>
2024-01-16T13:03:33.364591908Z     from rest_api.pipeline import setup_pipelines
2024-01-16T13:03:33.364595054Z   File "/code/rest_api/pipeline/__init__.py", line 9, in <module>
2024-01-16T13:03:33.364599362Z     from rest_api.pipeline import custom_pipelines
2024-01-16T13:03:33.364602288Z   File "/code/rest_api/pipeline/custom_pipelines.py", line 6, in <module>
2024-01-16T13:03:33.364605544Z     from unstructured_fileconverter_haystack import UnstructuredFileConverter
2024-01-16T13:03:33.364609592Z   File "/usr/local/lib/python3.11/site-packages/unstructured_fileconverter_haystack/__init__.py", line 4, in <module>
2024-01-16T13:03:33.364613399Z     from unstructured_fileconverter_haystack.fileconverter import UnstructuredFileConverter
2024-01-16T13:03:33.364616765Z   File "/usr/local/lib/python3.11/site-packages/unstructured_fileconverter_haystack/fileconverter.py", line 7, in <module>
2024-01-16T13:03:33.364620172Z     from haystack.preview import Document, component, default_to_dict
2024-01-16T13:03:33.364622777Z ModuleNotFoundError: No module named 'haystack.preview'

To Reproduce
My pipeline is pretty simple:

from unstructured_fileconverter_haystack import UnstructuredFileConverter
EMBEDDING_DIM = 1536
UNSTRUCTURED_API_URL = "http://unstructured:8002"
API_BASE_URL = "http://litellm:8001"
DOC_STORE_HOST = "opensearch"
UNSTRUCTURED_SETTINGS = {'skip_infer_table_types': '[]',
                         'chunking_strategy': 'by_title',
                         'combine_under_n_chars': '500',
                         'new_after_n_chars': '2000',
                         'max_characters': '2500',
                         'pdf_infer_table_structure': 'True',
                         'languages': 'eng',
                         'languages': 'fra',
                         'strategy': 'fast'}
document_store = OpenSearchDocumentStore(host=DOC_STORE_HOST, embedding_dim=EMBEDDING_DIM, use_ssl=True,
                                         verify_certs=False, http_auth=("xxxx", "xxxx"))
unstructured_converter = UnstructuredFileConverter(api_url=UNSTRUCTURED_API_URL,
                                                   document_creation_mode="one-doc-per-element",
                                                   unstructured_kwargs=UNSTRUCTURED_SETTINGS)
document_embedder = OpenAIDocumentEmbedder(api_base_url=API_BASE_URL, model_name="openai-embed", batch_size=8)
writer = DocumentWriter(document_store=document_store)
p = Pipeline()
p.add_component("UnstructuredFileConverter", unstructured_converter)
p.add_component("Embedder", document_embedder)
p.add_component("Writter", writer)
p.connect("UnstructuredFileConverter", "Embedder.documents")
p.connect("Embedder", "Writter")

# [... In My API ...]
result = indexing_pipeline.run({"UnstructuredFileConverter": {"paths": file_paths}})

Describe your environment (please complete the following information):

  • OS: [e.g. iOS]: Docker on Windows running Linux containers
  • Haystack version: 2.x
  • Integration version: 0.0.4
@lambda-science lambda-science added the bug Something isn't working label Jan 16, 2024
@lambda-science
Copy link
Contributor Author

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/unstructured/fileconverter/src/unstructured_fileconverter_haystack/fileconverter.py

Here there is no haystack.preview maybe I should try to force to pull the latest version to fix this. I will investigate

@lambda-science
Copy link
Contributor Author

lambda-science commented Jan 16, 2024

This specific commit fixed the issue, but it's has been commited two day later than the latest release of the integration ( 27 nov for v0.0.4 vs 29 nov for the commit) 622c546#diff-5a996d17249eb1cf1ba4b4e43f7739af7a8f5a183b5b375a1cdc591b0c02664d

Maybe it could be good to release a v0.0.5 with latest commits

@anakin87
Copy link
Member

Hey, @lambda-science!

Thanks for opening this issue.

We will soon release a new version...

@lambda-science
Copy link
Contributor Author

Hey, @lambda-science!

Thanks for opening this issue.

We will soon release a new version...

Hey thanks for the quick merge request :)
If this issue is now closed, does it mean that re-installing unstructured_fileconverter_haystack will fix the issue or we need to wait a futur proper release with all recent changes ? (maybe after mounter ALL integrations in haystack_integrations)

@anakin87 anakin87 reopened this Jan 17, 2024
@anakin87
Copy link
Member

It was automatically closed.
Sorry. I hope to release a new version soon...

@anakin87
Copy link
Member

A new version has been released now: https://pypi.org/project/unstructured-fileconverter-haystack/0.2.0/

It is not documented yet, but we changed the import:
from haystack_integrations.components.converters.unstructured import UnstructuredFileConverter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants