Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Unstructured Document Converter (Related to Asyncio) #26

Open
karbasia opened this issue Jun 7, 2024 · 0 comments
Open

Issue with Unstructured Document Converter (Related to Asyncio) #26

karbasia opened this issue Jun 7, 2024 · 0 comments

Comments

@karbasia
Copy link

karbasia commented Jun 7, 2024

I'm testing a pipeline that utilizes the Unstructured Converter component for processing PDFs. The pipeline works locally, but fails with Hayhooks.

The error is as follows:

2024-06-07 14:41:27 
Converting files to Haystack Documents: 0it [00:00, ?it/s]Unstructured could not process file /data/file.pdf. Error: Traceback (most recent call last):
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/haystack_integrations/components/converters/unstructured/converter.py", line 198, in _partition_file_into_elements
2024-06-07 14:41:27     elements = partition_via_api(
2024-06-07 14:41:27                ^^^^^^^^^^^^^^^^^^
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/unstructured/partition/api.py", line 70, in partition_via_api
2024-06-07 14:41:27     sdk = UnstructuredClient(api_key_auth=api_key, server_url=base_url)
2024-06-07 14:41:27           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/unstructured_client/sdk.py", line 54, in __init__
2024-06-07 14:41:27     self.sdk_configuration = SDKConfiguration(
2024-06-07 14:41:27                              ^^^^^^^^^^^^^^^^^
2024-06-07 14:41:27   File "<string>", line 13, in __init__
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/unstructured_client/sdkconfiguration.py", line 38, in __post_init__
2024-06-07 14:41:27     self._hooks = SDKHooks()
2024-06-07 14:41:27                   ^^^^^^^^^^
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/unstructured_client/_hooks/sdkhooks.py", line 15, in __init__
2024-06-07 14:41:27     init_hooks(self)
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/unstructured_client/_hooks/registration.py", line 28, in init_hooks
2024-06-07 14:41:27     split_pdf_hook = SplitPdfHook()
2024-06-07 14:41:27                      ^^^^^^^^^^^^^^
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__
2024-06-07 14:41:27     nest_asyncio.apply()
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/nest_asyncio.py", line 19, in apply
2024-06-07 14:41:27     _patch_loop(loop)
2024-06-07 14:41:27   File "/opt/venv/lib/python3.12/site-packages/nest_asyncio.py", line 193, in _patch_loop
2024-06-07 14:41:27     raise ValueError('Can\'t patch loop of type %s' % type(loop))
2024-06-07 14:41:27 ValueError: Can't patch loop of type <class 'uvloop.Loop'>

The line of code that causes this to fail is nest_asyncio.apply().

After some research, I fixed the issue for myself by updating the cli code to the following. I not an expert here and wanted to know if this approach is fine?

import click
import uvicorn
import os
import sys
import asyncio

@click.command()
@click.option('--host', default="localhost")
@click.option('--port', default=1416)
@click.option('--pipelines-dir', default=os.environ.get("HAYHOOKS_PIPELINES_DIR"))
@click.option('--additional-python-path', default=os.environ.get("HAYHOOKS_ADDITIONAL_PYTHONPATH"))
def run(host, port, pipelines_dir, additional_python_path):
    if not pipelines_dir:
        pipelines_dir = "pipelines.d"
    os.environ["HAYHOOKS_PIPELINES_DIR"] = pipelines_dir

    if additional_python_path:
        sys.path.append(additional_python_path)
        
    loop = asyncio.new_event_loop()
    config = uvicorn.Config("hayhooks.server:app", host=host, port=port, loop=loop)
    server = uvicorn.Server(config)
    loop.run_until_complete(server.serve())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant