-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper speaker diarization returns NaN for timestamps #1077
Comments
I managed to figure out that property Follow up question: where would |
Thanks for debugging! I've identified the problem, which #1082 will fix. I'll also open a follow-up PR to improve the unit tests for the model. |
Thanks for your quick reply. A quick workaround I discovered earlier would be to downgrade to I have another small question that is related to this model. How would one track progress of the actual speaker diarization progress? The example code provided mentions a this.segmentation_processor ??= AutoProcessor.from_pretrained(
this.segmentation_model_id,
{
progress_callback,
}
); I thought that adding a |
Closed in 14bf689 👍 To respond to your question on progress: The model processes the audio all at once, and the post processing acts on the model output, so while there's technically no way to do it within the modelling code, you should be able to do something similar as follows:
Note that you would need to employ some form of speaker identification model into the pipeline to ensure that the same speaker is assigned across chunks (since there won't be any consistency across chunks) |
Thanks for the insight, it is much appreciated! I have decided to reduce complexity by not implementing speaker identification. As of now, I process the first 10 seconds of the audio, which returns the elapsed time. I then multiply this number by the amount of chunks to get a very rough estimated time to completion. I then show a loading bar that progresses over this estimated time. This should be enough as an enhanced user experience. |
System Info
Environment/Platform
Description
I have cloned the example repo and made some changes to the
transformers
dependencies, as I immediately got an error upon installing dependencies.I have replaced "@xenova/transformers":
"github:xenova/transformers.js#v3"
with"@huggingface/transformers": "^3.1.0"
. Also replaced the import inworker.js
.After loading the model and attempting to transcribe the example video, a table gets logged, where each
start
andend
segment isNaN
.Reproduction
Clone my repository, install dependencies and run the dev server. Use the following steps:
The text was updated successfully, but these errors were encountered: