Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load PDFs give error "cannot open broken document" #1067

Closed
1 of 2 tasks
aryan-rajoria opened this issue Sep 7, 2023 · 1 comment
Closed
1 of 2 tasks

Load PDFs give error "cannot open broken document" #1067

aryan-rajoria opened this issue Sep 7, 2023 · 1 comment
Assignees
Labels
Bug 🐞 EVA is not working as expected Crash 💥 EVA is crashing
Milestone

Comments

@aryan-rajoria
Copy link
Collaborator

Search before asking

  • I have searched the EvaDB issues and found no similar bug report.

Bug

09-07-2023 13:15:53 ERROR [plan_executor:plan_executor.py:execute_plan:0182] cannot open broken document
Traceback (most recent call last):
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py", line 33, in exec
    for batch in child_executor.exec(**kwargs):
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py", line 40, in exec
    for batch in child_executor.exec(**kwargs):
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py", line 36, in read
    for batch in reader.read():
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/readers/abstract_reader.py", line 54, in read
    for data in self._read():
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/readers/pdf_reader.py", line 35, in _read
    doc = fitz.open(self.file_url)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/fitz/fitz.py", line 4041, in __init__
    _fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
fitz.fitz.FileDataError: cannot open broken document
FileDataError                             Traceback (most recent call last)
[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py) in ?(self, do_not_raise_exceptions, do_not_print_exceptions)
    181                 if do_not_print_exceptions is False:
    182                     logger.exception(str(e))
--> 183                 raise ExecutorError(e)

[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py) in ?(self, *args, **kwargs)
     31     def exec(self, *args, **kwargs) -> Iterator[Batch]:
     32         child_executor = self.children[0]
---> 33         for batch in child_executor.exec(**kwargs):
     34             batch = apply_project(batch, self.target_list, self.catalog())

[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py) in ?(self, *args, **kwargs)
     38     def exec(self, *args, **kwargs) -> Iterator[Batch]:
     39         child_executor = self.children[0]
---> 40         for batch in child_executor.exec(**kwargs):
     41             # apply alias to the batch

[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py) in ?(self, table)
     34                 # setting batch_mem_size = 1, we need fix it
     35                 reader = PDFReader(str(image_file), batch_mem_size=1)
---> 36                 for batch in reader.read():
     37                     batch.frames[table.columns[0].name] = row_id
...
    181 if do_not_print_exceptions is False:
    182     logger.exception(str(e))
--> 183 raise ExecutorError(e)

ExecutorError: cannot open broken document

Environment

  • EvaDB 0.3.3
  • Ubuntu
  • python 3.11

Document: https://readthedocs.org/projects/evadb/downloads/ downloaded from here version 0.2.4

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@xzdandy xzdandy added Bug 🐞 EVA is not working as expected Crash 💥 EVA is crashing labels Sep 22, 2023
@xzdandy
Copy link
Collaborator

xzdandy commented Sep 22, 2023

Hi @aryan-rajoria, could we verify that the problem still occurs on EvaDB v0.3.6? Thanks!

@xzdandy xzdandy added this to the v0.3.7 milestone Sep 22, 2023
@xzdandy xzdandy removed this from the v0.3.7 milestone Sep 30, 2023
@kslohith kslohith self-assigned this Oct 27, 2023
kslohith pushed a commit to kslohith/evadb that referenced this issue Nov 7, 2023
kslohith pushed a commit to kslohith/evadb that referenced this issue Nov 7, 2023
jarulraj pushed a commit that referenced this issue Nov 7, 2023
…d pdf functionality. (#1343)

Issue #1067 about not being able to load pdf files, was verified to be
working with evadb documentation pdf and a new page for loading pdf is
added to the documentation.
<img width="1310" alt="Screenshot 2023-11-07 at 1 33 01 AM"
src="https://github.com/georgia-tech-db/evadb/assets/32676813/af2fa40b-c8c1-4f3d-b93f-98d0bf278a5b">

Co-authored-by: Lohith K S <[email protected]>
@xzdandy xzdandy added this to the v0.3.9 milestone Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐞 EVA is not working as expected Crash 💥 EVA is crashing
Projects
Development

No branches or pull requests

3 participants