[Feature request] onnx model compression #284

jozefchutka · 2023-09-07T11:32:21Z

Thanks for releasing models with reduced size https://twitter.com/xenovacom/status/1698742891118493905 .

I was thinking of further reduction using compression algorithm like brotli. I have tested current whisper-base.en (~51MB) can be reduced to ~26MB using:

brotli decoder_model_merged_quantized.onnx -o decoder_model_merged_quantized.onnx.br -Z -f

However huggingface_hub doesnt seem to be capable ATM huggingface/huggingface_hub#1446 .

So my idea / feature request is, whether instead of compressing on flight, it would be possible to:

compress in advance and commit compressed *.onnx.br files together with the original *.onnx ones
in runtime/JS check if brotli is supported by browser, and if so request *.onnx.br

In order for this to be transparent fetch() for browser, such .br files should be served with extra http headers:

Header set Content-Encoding br
Header append Vary Accept-Encoding

And preferably additional information on the about the original filesize (so js fetch() can recognise and report progress properly) i.e.

Header set x-file-size 123456789

The text was updated successfully, but these errors were encountered:

xenova · 2023-09-07T21:50:01Z

I think adding something like this via an experimental opt-in feature would be a good idea. In fact, this will be beneficial for all other files. For example, .save_pretained actually pretty-prints the JSON files, which unnecessarily increases tokenizer and config file sizes. Removing redundant whitespace and/or compression sounds like a good way to improve the user experience.

Surely there's a way to skip the requirement of serving extra HTTP headers? If so, I can easily update some of the models and do some testing. For example, the Compression Streams API?

jozefchutka · 2023-09-08T06:03:42Z

Here are some ideas regardingContent-Encoding headers:

research huggingface_hub options to define extra headers based on extension, or some kind of .htaccess file hosted/commited together in dir with the other hosted files
research on custom service worker, check if explicitly added Content-Encoding would trigger browser to some kind of transparent decode on client side
use gzip (instead of brotli) together with DecompressionStream ... tested ~50MB .onnx to 29MB .onnx.gz with max compression. However worth to keep an eye on browser memory use/management esp. when dealing with huge files
use brotli together with some 3rd party decompression lib (not available via DecompressionStream see Include brotli whatwg/compression#34 ), which however brings more dependencies, and content to download

Some ideas on x-file-size header:

reasearch on huggingface_hub
when DecompressionStream is involved, such extra header is actually not needed as JS would receive content-length matching the fetch progress numbers
maintain versioned .json with such information for each file
guess it during fetch progress by some multiplicator

In my project cdn I am using .htaccess with some hardcoded data:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP:Accept-encoding} br
RewriteCond %{REQUEST_FILENAME}\.br -s
RewriteRule ^(.*) $1\.br [QSA]

RewriteRule \.js\.br$ - [T=text/javascript,E=no-gzip,E=no-brotli]
RewriteRule \.wasm\.br$ - [T=application/wasm,E=no-gzip,E=no-brotli]

<FilesMatch "(\.br)$">
	Header set Content-Encoding br
	Header append Vary Accept-Encoding
</FilesMatch>

<FilesMatch "(ffmpeg-gpl-simd-wv.js\.br)$">
	Header set x-content-length 135493
</FilesMatch>
<FilesMatch "(ffmpeg-gpl.wasm\.br)$">
	Header set x-content-length 30574318
</FilesMatch>

jozefchutka added the enhancement New feature or request label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] onnx model compression #284

[Feature request] onnx model compression #284

jozefchutka commented Sep 7, 2023

xenova commented Sep 7, 2023 •

edited

Loading

jozefchutka commented Sep 8, 2023 •

edited

Loading

[Feature request] onnx model compression #284

[Feature request] onnx model compression #284

Comments

jozefchutka commented Sep 7, 2023

xenova commented Sep 7, 2023 • edited Loading

jozefchutka commented Sep 8, 2023 • edited Loading

xenova commented Sep 7, 2023 •

edited

Loading

jozefchutka commented Sep 8, 2023 •

edited

Loading