Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document extractor with unstructured io for pptx does not function as expected #10956

Open
5 tasks done
fdb02983rhy opened this issue Nov 21, 2024 · 1 comment
Open
5 tasks done
Labels
🐞 bug Something isn't working

Comments

@fdb02983rhy
Copy link
Contributor

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.11.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Use doc extractor to process a pptx with unstructured io

✔️ Expected Behavior

Process succefully

❌ Actual Behavior

api-1`         |   File "/app/api/.venv/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 115, in handle_request
api-1         |     for item in respiter:
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/werkzeug/wsgi.py", line 256, in __next__
api-1         |     return self._next()
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
api-1         |     for item in iterable:
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/flask/helpers.py", line 113, in generator
api-1         |     yield from gen
api-1         |   File "/app/api/libs/helper.py", line 186, in generate
api-1         |     yield from response
api-1         |   File "/app/api/core/app/features/rate_limiting/rate_limit.py", line 115, in __next__
api-1         |     return next(self.generator)
api-1         |   File "/app/api/core/app/apps/base_app_generate_response_converter.py", line 25, in _generate_full_response
api-1         |     for chunk in cls.convert_stream_full_response(response):
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_response_converter.py", line 67, in convert_stream_full_response
api-1         |     for chunk in stream_response:
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_task_pipeline.py", line 187, in _to_stream_response
api-1         |     for stream_response in generator:
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_task_pipeline.py", line 218, in _wrapper_process_stream_response
api-1         |     for response in self._process_stream_response(tts_publisher=tts_publisher, trace_manager=trace_manager):
api-1         |   File "/app/api/core/app/apps/advanced_chat/generate_task_pipeline.py", line 319, in _process_stream_response
api-1         |     workflow_node_execution = self._handle_workflow_node_execution_failed(event)
api-1         |   File "/app/api/core/app/task_pipeline/workflow_cycle_manage.py", line 339, in _handle_workflow_node_execution_failed
api-1         |     WorkflowNodeExecution.process_data: json.dumps(event.process_data) if event.process_data else None,
api-1         |   File "/usr/local/lib/python3.10/json/__init__.py", line 231, in dumps
api-1         |     return _default_encoder.encode(obj)
api-1         |   File "/usr/local/lib/python3.10/json/encoder.py", line 199, in encode
api-1         |     chunks = self.iterencode(o, _one_shot=True)
api-1         |   File "/usr/local/lib/python3.10/json/encoder.py", line 257, in iterencode
api-1         |     return _iterencode(o, 0)
api-1         |   File "/app/api/.venv/lib/python3.10/site-packages/frozendict/__init__.py", line 32, in default
api-1         |     return BaseJsonEncoder.default(
api-1         |   File "/usr/local/lib/python3.10/json/encoder.py", line 179, in default
api-1         |     raise TypeError(f'Object of type {o.__class__.__name__} '
api-1         | TypeError: Object of type File is not JSON serializable
``
Copy link

dosubot bot commented Nov 21, 2024

The error message "TypeError: Object of type File is not JSON serializable" indicates that there is an attempt to serialize a File object into JSON, which is not directly possible. This typically happens when a File object is included in a data structure that is being converted to JSON, such as when using json.dumps().

To resolve this issue, you need to ensure that any File objects are converted to a JSON-serializable format before attempting to serialize them. This could involve extracting relevant information from the File object, such as its path, name, or content, and including only that information in the data structure being serialized.

In the context of the Dify platform, you might need to check the part of the code where process_data is being prepared for serialization and ensure that it does not include non-serializable objects like File. Instead, convert these objects to a format that can be serialized, such as a string or dictionary containing the necessary details.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant