Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160

Open
filip26 opened this issue Dec 13, 2024 · 0 comments

Comments

@filip26
Copy link

filip26 commented Dec 13, 2024

Hi,
I’d like to propose avoiding JSON Lines for the following reasons:

Added Complexity

  • Supporting JSON Lines requires additional implementation effort to handle both standard JSON and JSON Lines parsing.
  • Converting JSON Lines into standard JSON through pre-processing is inefficient, as it results in redundant parsing with no added value other than compatibility.

Limited Extensibility

  • JSON Lines does not allow adding metadata, such as positions or links to subsequent chunks, etc.

Inefficient Processing

  • Processing line-by-line in a streaming context is less efficient compared to handling chunks or pages.
  • JSON Lines enforces sequential, linear history processing. Standard JSON Object with embedded links enables non-linear history processing.

Using JSON improves adoption, speeds up processing, and supports extensibility.

Please consider the outcome. Thank you.

JSON Lines were likely intended to serve as a replacement for CSV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant