Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for .gz files #48

Open
andytwigg opened this issue Jan 27, 2017 · 1 comment
Open

add support for .gz files #48

andytwigg opened this issue Jan 27, 2017 · 1 comment

Comments

@andytwigg
Copy link

would be nice to add support for opening .gz files
Ideally we could pass a file handle, eg

import gzip, paratext
with gzip.open(f, 'rb') as fh:
  paratext.read(fh)

It seems like the file handle is opened by the C code, so perhaps this is not practical, and easier to add gzip reading support directly to the C code?

@deads
Copy link
Contributor

deads commented Feb 17, 2017

Thank you for the feature request and the suggestion. The way paratext is architected, using a Python file handle would require random access on the file. This is not easily achievable with the Lempel-Ziv algorithm on which gzip is based -- some files use a fixed dictionary in the header, but this is not true of all files. One would need to do a first sequential pass on the file to build the dictionary at different chunk start points. Then, the threads are spawned and start decompressing their respective chunks using each's respective reconstructed dictionary. We would welcome this contribution if someone wants to take a crack at it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants