Accelerate vDSP Mel Spectrogram support, Sample Accrual, macOS support, Add Tokenizer - WIP Branch #2

vade · 2023-01-02T17:47:24Z

Hello

This is a WIP port which attempts to remove the Rust MEL spectrogram implementation and use native vDSP / Accelerate.

I've opened a PR mostly as a WIP to have a place to discuss the work done!

Status: Incomplete, but close. Need some community help. We solved the repeating token problem, but aren't getting sensible output because our Mel Spectrogram isnt matching Torches exactly.

Work to date:

I've added a macOS SwiftUI implementation. I've updated the main method of the Whisper implementation to take a URL to an asset. The new decode method accrues a set of samples for a segment of Whisper transcription, and runs it, then continues to accrue samples.
I've created a Log Mel spectrogram implementation with vDSP, and numerically checked it against Whispers audio loading and normalization code. This code is close, but different enough to be causing incorrect output in Whisper. I have verified that we get correct output if we import a correct Log Mel as generated by Python natively.
Ive updated the CoreML Export script to output flexible shapes on the decoders token input. It is my understanding that we need to pass an 'accrued' number of tokens. Ie we start with the SOT token, predict on our segment of audio, get a new token, append that to a running token list, and run the decoder again with a tensor of both tokens.
Ive created a Tokenizer based off of GPT2Tokenizer implementation from Hugging Faces swift transformers repo, and generously borrowed some code from that repo to help. Ive implemented a very simple greedy top-1 token strategy.

Overview:

Right now, this repo:

creates an AVAssetReader, audio output, and an audio mix output which forces a single channel, 16Khz Sint16 output.
We decode sample buffers, and as we decode a vector of samples, we accrue them until we hit the number of samples Whisper expects in a segment.
We create a Log Mel Spectrogram from the audio (or attempt to do so correctly)
We encode it to audio features using the Whisper encoder model
We then prime with a SOT token, and send the single token and the audio features to the Whisper Decoder.
This results in our first predicted logits, which we use to find the top 1 probably token, and save that. We keep iterating this way, and sending an array of Int tokens to our decoder along with the same audio segment, until we hit a EOT token.

Work left to do:

Triple check the Log Mel code. You can use this iPython Notebook to generate some test data with your own audio the same way Whisper does, to verify things smell correct numerically. I believe this is correct, and pretty fast. I based it off of Apples MEL code, with some examples. It could use some minor clean up.
Implement a better tokenizer. The Whisper Tokenizer leverages some 'special tokens', and my presumption is they are ignored in the decode BPE pass, sort of like 'additional logic' but Im not entirely sure what needs to happen to properly implement them in Swift.
Fix token repetition from the decoder. I think the issue has to do with not having the equivalent of "additional_special_tokens" method in the custom tokenizer. it seems like we need to init the model with multiple tokens?
Time stamps, additional quality of life stuff

Thanks for any insight! Happy new year!

…de to swift

…ls dont appear to want to run on iOS though, too large?

…er / clearer what needs to happen

…matrix order bug thanks to actually commenting our code.

…token decoding loop. Not working yet, but we are super close.

…s than Whisper / FFMPEG sint16 + normalization. Trying to figure out numerical compatibilities here.

…hape?

…n selection.

…er show it tokens it doesnt know about. the code is terrible. bear with me.

…e need to get our tokenizer strategies sorted.

…had the wrong tokens, and wrong decoder settings. Uggghhhhhhhhh newb shit.

…SPSplitComplex object and do maths frmo it.

…ListPointer. Clean up some CMSampleBuffer code.

…sing repeating / bad values in our MEL output.

… comparisons.

sahilshah · 2024-11-13T01:24:59Z

hey were you able to get this across the finish line? if not what did you about getting a fast mel spectogram for the whisper model calls?

vade added 28 commits January 2, 2023 12:42

Remove rust library, remove bridging header

db4d5e6

Add a WIP Mel implementation (not correct just yet, we arent doing Hops)

c4ca739

Add our mel filters

a91cf01

WIP - attempt to port my obj-c accrue samples from a sample buffer co…

f8931cb

…de to swift

Stub in a method to load a mac SwiftUI app as well as on iOS. My mode…

9e572dd

…ls dont appear to want to run on iOS though, too large?

More work cleaning up Swift BS

561ec42

More work on swift audio buffer access

285ab74

fix accruing bullshit

caa6c92

Work on Spectrogram

965f586

Fix optimization on debug builds to ease debugging

13a4715

Clarifications / refactoring of Mel calculations. Not right, but clos…

f53279e

…er / clearer what needs to happen

Updates

8337c44

Get the proper size complex matrix out of our STFT process.

c711c57

get our last STFT column in

d80d2b9

WIP

86d6b5b

Outputting mel. Need to do sanity check values

d3ba682

Clean up call to cblas_sgemm, get transpose for magnitudes, fix dumb …

bf02817

…matrix order bug thanks to actually commenting our code.

Clean up logging

f75c4ad

Minor changes

028c8ce

Add tokenizer from OpenAI's GPT2. Update some missing MEL steps, add …

a051c51

…token decoding loop. Not working yet, but we are super close.

Clarify token values

2aae073

It seems like AVoundations float 32 format is vending different value…

98e2714

…s than Whisper / FFMPEG sint16 + normalization. Trying to figure out numerical compatibilities here.

Fix a bug with our log base 10 calculation using vDSP. Now using vForce

91605f1

Fix a bug with Token numeric type being wrong

fa36a51

Attempt to fix token tensor type in CML

f1b61f7

add names to tensors for clear api in swift

451dee1

I think our tokens needs to have flexible dimensions!?

21f838f

Update method to support array of tokens for our now flexible token s…

f912d7f

…hape?

vade changed the title ~~Accelerate vDSP Mel Spectrogram support - WIP Branch~~ Accelerate vDSP Mel Spectrogram support, Sample Accrual, macOS support, Add Tokenizer - WIP Branch Jan 8, 2023

Add some additional helper files we can use for topK strategy of toke…

06564f0

…n selection.

vade added 18 commits January 8, 2023 01:59

we need a SOT sequence. Work around decoder issues by ensuring we nev…

195f7ab

…er show it tokens it doesnt know about. the code is terrible. bear with me.

Very close. Were getting sentences now, but they arent as accurate. W…

c515ad8

…e need to get our tokenizer strategies sorted.

Add multilingual vocab. Our model is multilingual by default, and we …

92a756d

…had the wrong tokens, and wrong decoder settings. Uggghhhhhhhhh newb shit.

Clean up a warning by using scoped access to raw memory to make our D…

6e11baf

…SPSplitComplex object and do maths frmo it.

Clean up some logic

4511c56

Clean up warning on pointer lifetime and use UnsafeMutableAudioBuffer…

5e53254

…ListPointer. Clean up some CMSampleBuffer code.

Update Mel to use simpler FFT method. Fix a transpose CBLAS error cau…

a4c7487

…sing repeating / bad values in our MEL output.

Update comment.

968f81d

Major cleanup of our Mel Spectrogram

a24ee30

More Mel Cleanup, along with comments about python vs macOS numerical…

472999f

… comparisons.

Clean up mel, add autorelease pool in tight loops to clean up memory

3d9fffb

Clean up

d9699d3

Seems like our magnitudes are crazy large? Trying to debug?

376dc54

Add a separate class for FFT implementations.

a4c6ea1

Project Update

d853b46

Working on refactoring code.

d866d95

Get transcription working

4ded7c0

WIP updates

0af4924

tomchang25 mentioned this pull request Apr 15, 2023

Support M1/2 - With GPU / Neural Engine acceleration tomchang25/whisper-auto-transcribe#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate vDSP Mel Spectrogram support, Sample Accrual, macOS support, Add Tokenizer - WIP Branch #2

Accelerate vDSP Mel Spectrogram support, Sample Accrual, macOS support, Add Tokenizer - WIP Branch #2

vade commented Jan 2, 2023 •

edited

Loading

sahilshah commented Nov 13, 2024

Accelerate vDSP Mel Spectrogram support, Sample Accrual, macOS support, Add Tokenizer - WIP Branch #2

Are you sure you want to change the base?

Accelerate vDSP Mel Spectrogram support, Sample Accrual, macOS support, Add Tokenizer - WIP Branch #2

Conversation

vade commented Jan 2, 2023 • edited Loading

sahilshah commented Nov 13, 2024

vade commented Jan 2, 2023 •

edited

Loading