-
Notifications
You must be signed in to change notification settings - Fork 47
Advanced Topic: fpzip and kempressed Encodings
In your Neuroglancer info file, set the "encoding" field for a given mip level to "fpzip"
to use the raw fpzip compression algorithm or to "kempressed"
to use Kempression. Unfortunately, these codecs are only supported by CloudVolume at the moment.
Offical Neuroglancer support is forthcoming, but you can visualize fpzip
and kempressed
encodings using this Neuroglancer branch: https://github.com/william-silversmith/neuroglancer/tree/wms_fpzip
{
"type": "image",
"data_type": "float32",
"num_channels": 3,
"scales": [{
"chunk_sizes": [[ 256, 256, 16 ]],
"encoding": "fpzip",
"key": "4_4_40",
"resolution": [ 4, 4, 40 ],
"size": [ 80000, 60000, 1890 ],
"voxel_offset": [ 0, 0, 0 ]
}]
}
from cloudvolume import CloudVolume
info = cloudvolume.CloudVolume.create_new_info(
num_channels = 3,
layer_type = 'image',
data_type = 'float32',
encoding = 'kempressed', # or fpzip
resolution = [4, 4, 40], # Voxel scaling, units are in nanometers
voxel_offset = [0, 0, 0], # x,y,z offset in voxels from the origin
chunk_size = [ 128, 128, 64 ], # units are voxels
volume_size = [ 250000, 250000, 25000 ], # e.g. a cubic millimeter dataset
)
vol = CloudVolume(..., info=info)
vol.commit_info()
In some connectomics segmentation pipelines, an important intermediate step is to generate voxel pair affinities in the X, Y, and Z dimensions. These affinities are represented as float32s, meaning that compared with the original image, often a single dimension of uint8s, the affinities are 12x as large. Unfortunately, we have found that they do not compress sufficiently with gzip.
Into this void steps fpzip. It's a fast lossless compression algorithm for multi-dimensional floating point data developed by Peter Lindstrom et al at LLNL.
As non-FIBSEM connectomics datasets are highly anisotropic (often at ratios between 5:1 to 10:1 for the Z axis), Nico Kemnitz found that reorganizing the data from XYZC to XYCZ would group more similar data near each other. He also found that as our data were all between 0 to 1, by adding 2.0f to all data, it was possible to set the exponents of all the floating point data to the same value at the cost of a machine epsilon of precision.
"Kempressed" data consist of these two manipulations plus fpzip compression.
The fpzip codec uses the C++ code written by the fpzip authors. We added a Cython interface (fpzip.pyx
) and a Python extension compilation toolchain to enable its use in CloudVolume. Unfortuantely, C and C++ extensions are not very well supported by the Python ecosystem, so our extension requires installing numpy prior to cloud-volume installation. If you do not have numpy pre-installed, fpzip compilation will be skipped as the numpy header files are required to compile our wrapper.
The following data were compiled by Kemnitz over 100 runs on a 256x256x16x3 connectomics dataset (1.17 GiB) imaged at 4x4x40 nm.
Manipulation | Codec | Encoding (s) | Compression (s) | Decompression (s) | Decoding (s) | Total (MiB) | Ratio |
---|---|---|---|---|---|---|---|
None | gzip -6 | 0 | 91.1 | 12.31 | 0 | 779.57 | 64.96% |
+2.0 | gzip -6 | 0.29 | 82.6 | 14.97 | 0.29 | 674 | 56.17% |
+2.0 & XYAZ | gzip -6 | 0.51 | 81.05 | 12.09 | 0.51 | 674.05 | 56.17% |
None | zstd -14 | 0 | 112.02 | 5.04 | 0 | 709.96 | 59.16% |
+2.0 | zstd -14 | 0.29 | 117.4 | 4.73 | 0.29 | 605.35 | 50.45% |
+2.0 & XYAZ | zstd -14 | 0.51 | 114.4 | 5.12 | 0.51 | 603.75 | 50.31% |
None | fpzip | 0 | 19.13 | 29.19 | 0 | 561.49 | 46.79% |
+2.0 | fpzip | 0.29 | 16.32 | 21.21 | 0.29 | 458.56 | 38.21% |
+2.0 & XYAZ | fpzip | 0.51 | 14.3 | 18.67 | 0.51 | 395.04 | 32.92% |