Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: uncompressed input size #141

Open
natefoo opened this issue Nov 7, 2024 · 0 comments
Open

Proposal: uncompressed input size #141

natefoo opened this issue Nov 7, 2024 · 0 comments

Comments

@natefoo
Copy link
Member

natefoo commented Nov 7, 2024

Currently input_size is the size of the raw input, which can be either compressed or uncompressed. When scaling memory based on input size you probably only care about the uncompressed size. But gzip does store the uncompressed size, which we could read into a separate uncompressed_jnput_size variable. The uncompressed size is stored in the last 4 bytes, this seems to work for me:

#!/usr/bin/env python3
import os
import sys

path = sys.argv[1]

with open(path, 'rb') as f:
    f.seek(-4, os.SEEK_END)
    size = int.from_bytes(f.read(4), 'little')
    print(size)

The uncompressed size also isn't always set properly:

nate@pdp-11% gzip -l /home/nate/work/galaxy/test-data/1.bam
         compressed        uncompressed  ratio uncompressed_name
               3592                   0   0.0% /home/nate/work/galaxy/test-data/1.bam

So we should have a default... actual size, or actual size * some constant factor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant