Replies: 1 comment
-
No need to use coreml models anymore especially with -fa (flash attention) which uses your GPU. I used to do large-v2 coreml and medium corelml models and the large one would hog my computer so couldn't use it for anything else on a Mac M1 with 8GB RAM. Now for English just use distil-large-v3 model which uses 2GB of RAM instead of 3GB RAM for coreml. And core large-v2 I got 2.5x realtime speed with flash attention (without it 1.4x realtime speed). But with distil-large-v3 get 6x to 8.3 realtime speed English only though. For multilingual use the large-v2-q5_0 model with flash attention 1.8x realtime speed..haven't completed an audiobook yet). I deleted Xcode app and that gave me 6GB of free space since no longer need to complile coreml models. well...this speed will have to suffice for multilingual at least for me. whisper.cpp took 00h:36m:29s Total duration of audiobook is 4052 seconds Whisper large-v2-q5_0 model transcribed at 1.85x realtime speed tried without flash attention and it's a tad slower whisper.cpp took 00h:43m:10s Total duration of audiobook is 01h:07m:32s Whisper large-v2-q5_0 model transcribed at 1.56x realtime speed |
Beta Was this translation helpful? Give feedback.
-
The script
download-coreml-model.sh
is no longer functional, but can I download them manually somehow? I really struggling to convert them myself. I keep getting the error:Beta Was this translation helpful? Give feedback.
All reactions