Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write does not respect codec #76

Open
lvarriano opened this issue Mar 14, 2024 · 1 comment
Open

write does not respect codec #76

lvarriano opened this issue Mar 14, 2024 · 1 comment
Labels
compression Waveform Compression duplicate This issue or pull request already exists

Comments

@lvarriano
Copy link
Contributor

lvarriano commented Mar 14, 2024

If I take an encoded waveform, read it and write it to a new file, it will not be written as encoded. (It will be written as a gzipped array, but this is because LH5Store.write() now defaults to compression.) However, it will still have the attributes associated with encoding ('codec', etc.) This does not seem like behavior that the user would expect. Reading in a data set and then writing it to file should not change how the data is stored.

This is because read decodes the waveform and stores it in an ArrayOfEqualSizedArrays. When write comes to this object, it does not encode it because it is not an ArrayOfEncodedEqualSizedArrays.

import lgdo
import h5py
import numpy as np

store = lgdo.lh5.LH5Store()

input_file = "/home/lv/Documents/uw/l200/l200-p06-r000-phy-20230619T034203Z-tier_raw.lh5"
output_file = "output.lh5"

ch_list = lgdo.lh5.ls(input_file)[2:] # skip FCConfig and OrcaHeader

# copy data
for ch in ch_list:
    chobj, _ = store.read(f'{ch}/raw/', input_file)
    store.write(chobj, 'raw', output_file, f'{ch}/')

ch = 'ch1027200'

print('load input file with LH5Store')
chobj, _ = store.read(ch+'/raw/', input_file)
print(chobj['waveform_windowed']['values'].attrs)
print(chobj['waveform_windowed']['values'].nda.shape)
print(np.prod(chobj['waveform_windowed']['values'].nda.shape))

print('\nload input file with h5py')
with h5py.File(input_file, mode='r') as f:
    print(f[ch]['raw']['waveform_windowed']['values'].attrs.keys())
    print(f[ch]['raw']['waveform_windowed']['values'].keys())
    print(f[ch]['raw']['waveform_windowed']['values']['encoded_data'].keys())
    print(f[ch]['raw']['waveform_windowed']['values']['encoded_data']['flattened_data'].shape)
    print(f[ch]['raw']['waveform_windowed']['values']['encoded_data']['flattened_data'].compression)

print('\nload output file with LH5Store')
chobj, _ = store.read(ch+'/raw/', output_file)
print(chobj['waveform_windowed']['values'].attrs)
print(chobj['waveform_windowed']['values'].nda.shape)
print(np.prod(chobj['waveform_windowed']['values'].nda.shape))

print('\nload output file with h5py')
with h5py.File(output_file, mode='r') as f:
    print(f[ch]['raw']['waveform_windowed'].keys())
    print(f[ch]['raw']['waveform_windowed']['values'].attrs.keys())
    print(f[ch]['raw']['waveform_windowed']['values'].shape)
    print(f[ch]['raw']['waveform_windowed']['values'].compression)

gives

load input file with LH5Store
{'codec': 'radware_sigcompress', 'codec_shift': -32768.0, 'datatype': 'array_of_equalsized_arrays<1,1>{real}'}
(3034, 1400)
4247600

load input file with h5py
<KeysViewHDF5 ['codec', 'codec_shift', 'datatype']>
<KeysViewHDF5 ['decoded_size', 'encoded_data']>
<KeysViewHDF5 ['cumulative_length', 'flattened_data']>
(2427360,)
None

load output file with LH5Store
{'codec': 'radware_sigcompress', 'codec_shift': -32768.0, 'datatype': 'array_of_equalsized_arrays<1,1>{real}'}
(3034, 1400)
4247600

load output file with h5py
<KeysViewHDF5 ['dt', 't0', 'values']>
<KeysViewHDF5 ['codec', 'codec_shift', 'datatype']>
(3034, 1400)
gzip
@gipert
Copy link
Member

gipert commented Mar 14, 2024

Yes we should discuss what the behavior should be. I was tracking this in #37.

@gipert gipert added duplicate This issue or pull request already exists compression Waveform Compression labels Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compression Waveform Compression duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants