Are the CoreML models downloadable? #2306

FigBug · 2024-07-18T17:23:44Z

FigBug
Jul 18, 2024

The script download-coreml-model.sh is no longer functional, but can I download them manually somehow? I really struggling to convert them myself. I keep getting the error:

Traceback (most recent call last):
  File "/Users/rrabien/dev.github/ncontrol/modules/whisper.cpp/models/convert-whisper-to-coreml.py", line 2, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'
coremlc: error: Model does not exist at models/coreml-encoder-base.en.mlpackage -- file:///Users/rrabien/dev.github/ncontrol/modules/whisper.cpp/
mv: rename models/coreml-encoder-base.en.mlmodelc to models/ggml-base.en-encoder.mlmodelc: No such file or directory

mrfragger · 2024-07-30T16:36:27Z

mrfragger
Jul 30, 2024

No need to use coreml models anymore especially with -fa (flash attention) which uses your GPU. I used to do large-v2 coreml and medium corelml models and the large one would hog my computer so couldn't use it for anything else on a Mac M1 with 8GB RAM.

Now for English just use distil-large-v3 model which uses 2GB of RAM instead of 3GB RAM for coreml. And core large-v2 I got 2.5x realtime speed with flash attention (without it 1.4x realtime speed). But with distil-large-v3 get 6x to 8.3 realtime speed English only though. For multilingual use the large-v2-q5_0 model with flash attention 1.8x realtime speed..haven't completed an audiobook yet). I deleted Xcode app and that gave me 6GB of free space since no longer need to complile coreml models.

well...this speed will have to suffice for multilingual at least for me.

whisper.cpp took 00h:36m:29s
Transcription took 2189 seconds

Total duration of audiobook is 4052 seconds
Total duration of audiobook is 01h:07m:32s

Whisper large-v2-q5_0 model transcribed at 1.85x realtime speed
Set language: en | Translated:
Max length: 120 | Print Colors:
No Graphics: | Flash Attention: -fa

tried without flash attention and it's a tad slower

whisper.cpp took 00h:43m:10s
Transcription took 2590 seconds

Total duration of audiobook is 01h:07m:32s
Total duration of audiobook is 4052 seconds

Whisper large-v2-q5_0 model transcribed at 1.56x realtime speed
Set language: en | Translated:
Max length: 120 | Print Colors:
No Graphics: | Flash Attention:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are the CoreML models downloadable? #2306

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Are the CoreML models downloadable? #2306

FigBug Jul 18, 2024

Replies: 1 comment

mrfragger Jul 30, 2024

FigBug
Jul 18, 2024

mrfragger
Jul 30, 2024