Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] onnx model compression #284

Open
jozefchutka opened this issue Sep 7, 2023 · 2 comments
Open

[Feature request] onnx model compression #284

jozefchutka opened this issue Sep 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@jozefchutka
Copy link

Thanks for releasing models with reduced size https://twitter.com/xenovacom/status/1698742891118493905 .

I was thinking of further reduction using compression algorithm like brotli. I have tested current whisper-base.en (~51MB) can be reduced to ~26MB using:

brotli decoder_model_merged_quantized.onnx -o decoder_model_merged_quantized.onnx.br -Z -f

However huggingface_hub doesnt seem to be capable ATM huggingface/huggingface_hub#1446 .

So my idea / feature request is, whether instead of compressing on flight, it would be possible to:

  1. compress in advance and commit compressed *.onnx.br files together with the original *.onnx ones
  2. in runtime/JS check if brotli is supported by browser, and if so request *.onnx.br

In order for this to be transparent fetch() for browser, such .br files should be served with extra http headers:

Header set Content-Encoding br
Header append Vary Accept-Encoding

And preferably additional information on the about the original filesize (so js fetch() can recognise and report progress properly) i.e.

Header set x-file-size 123456789
@jozefchutka jozefchutka added the enhancement New feature or request label Sep 7, 2023
@xenova
Copy link
Collaborator

xenova commented Sep 7, 2023

I think adding something like this via an experimental opt-in feature would be a good idea. In fact, this will be beneficial for all other files. For example, .save_pretained actually pretty-prints the JSON files, which unnecessarily increases tokenizer and config file sizes. Removing redundant whitespace and/or compression sounds like a good way to improve the user experience.

Surely there's a way to skip the requirement of serving extra HTTP headers? If so, I can easily update some of the models and do some testing. For example, the Compression Streams API?

@jozefchutka
Copy link
Author

jozefchutka commented Sep 8, 2023

Here are some ideas regardingContent-Encoding headers:

  • research huggingface_hub options to define extra headers based on extension, or some kind of .htaccess file hosted/commited together in dir with the other hosted files
  • research on custom service worker, check if explicitly added Content-Encoding would trigger browser to some kind of transparent decode on client side
  • use gzip (instead of brotli) together with DecompressionStream ... tested ~50MB .onnx to 29MB .onnx.gz with max compression. However worth to keep an eye on browser memory use/management esp. when dealing with huge files
  • use brotli together with some 3rd party decompression lib (not available via DecompressionStream see Include brotli whatwg/compression#34 ), which however brings more dependencies, and content to download

Some ideas on x-file-size header:

  • reasearch on huggingface_hub
  • when DecompressionStream is involved, such extra header is actually not needed as JS would receive content-length matching the fetch progress numbers
  • maintain versioned .json with such information for each file
  • guess it during fetch progress by some multiplicator

In my project cdn I am using .htaccess with some hardcoded data:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP:Accept-encoding} br
RewriteCond %{REQUEST_FILENAME}\.br -s
RewriteRule ^(.*) $1\.br [QSA]

RewriteRule \.js\.br$ - [T=text/javascript,E=no-gzip,E=no-brotli]
RewriteRule \.wasm\.br$ - [T=application/wasm,E=no-gzip,E=no-brotli]

<FilesMatch "(\.br)$">
	Header set Content-Encoding br
	Header append Vary Accept-Encoding
</FilesMatch>

<FilesMatch "(ffmpeg-gpl-simd-wv.js\.br)$">
	Header set x-content-length 135493
</FilesMatch>
<FilesMatch "(ffmpeg-gpl.wasm\.br)$">
	Header set x-content-length 30574318
</FilesMatch>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants