Weird hallucination #1538
Replies: 5 comments 4 replies
-
Because Whisper uses a weakly labeled training set, it inevitably leads to this issue. |
Beta Was this translation helpful? Give feedback.
-
I don't understand your answer. |
Beta Was this translation helpful? Give feedback.
-
Whisper predicts its next token by analyzing both the prior context and processed audio data. The selection of tokens is always based on the logits probability of different vocabulary words. When the training data includes non-spoken elements like links or copyright symbols, the model may inadvertently learn these as features. This becomes a significant issue in scenarios with low or unclear audio. In such cases, the closely matched probabilities of various tokens can lead to near-random choices. Since these non-spoken elements often appear in training where audio is absent or indistinct, the model might associate them with these quiet segments. Techniques like Reinforcement Learning from Human Feedback (RLHF) could potentially address these challenges, but the extent to which OpenAI has implemented these in Whisper, or shared the advanced versions of the model, remains unclear. |
Beta Was this translation helpful? Give feedback.
-
In other words, OpenAI used a lot of garbage to train the model. |
Beta Was this translation helpful? Give feedback.
-
Currently using large-3 and experiencing severe hallucinations |
Beta Was this translation helpful? Give feedback.
-
I transcribed a radio show I recorded from the stream of a radio station in 2015. It ended with:
[00:54:31.000 --> 00:54:36.000] www.mooji.org
[00:54:37.000 --> 00:54:41.000] Copyright © 2020 Mooji Media Ltd. All Rights Reserved.
[00:54:41.000 --> 00:54:44.000] No part of this recording may be reproduced
[00:54:44.000 --> 00:54:47.000] without Mooji Media Ltd.'s express consent.
[00:54:48.000 --> 00:54:51.000] www.mooji.org
[00:54:51.000 --> 00:54:54.000] Copyright © 2020 Mooji Media Ltd. All Rights Reserved.
[00:54:54.000 --> 00:54:57.000] No part of this recording may be reproduced
[00:54:57.000 --> 00:55:00.000] without Mooji Media Ltd.'s express consent.
Which is definitely not there. Would whisper have 'heard' a copyright symbol? Append it sometimes when it comes across the word? I don't mind. I just think it's odd. Whisper downloaded November 15.
Beta Was this translation helpful? Give feedback.
All reactions