-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix:handle empty string transcriptions #150
Conversation
WalkthroughThe pull request modifies the Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- ovos_dinkum_listener/service.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ovos_dinkum_listener/service.py (2)
662-672
: Improved handling of empty transcriptionsThe changes to the
_stt_text
method enhance the handling of transcriptions:
- Empty strings are now filtered out from the transcripts.
- The method now checks if there are any valid utterances before emitting the message.
- If no valid utterances are found, it checks the listening mode before emitting an "unknown" message.
These changes improve the robustness of the speech recognition process by avoiding the processing of empty or invalid transcriptions.
662-672
: Consider impact on other componentsWhile the changes to the
_stt_text
method are improvements, it's important to ensure that other parts of the system that may depend on this method's behavior are aware of these changes. Specifically:
- Components expecting empty string utterances might need to be updated.
- The new behavior of not emitting a message for empty utterances in continuous listening mode could affect downstream processes.
To ensure these changes don't introduce unexpected behavior, please run the following verification:
This will help identify any areas of the codebase that might be affected by these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
ovos_dinkum_listener/service.py (2)
662-673
: Improved handling of STT resultsThe changes to the
_stt_text
method enhance the handling of speech-to-text results:
- Empty utterances are now filtered out using a list comprehension.
- The method now checks if there are any valid utterances before emitting the "recognizer_loop:utterance" message.
- If there are no valid utterances, it handles the case differently based on the listening mode.
These changes improve the robustness of the voice input processing.
However, there's a minor optimization that could be made:
- utts = [u[0] for u in transcripts if u[0].strip()] + utts = [u[0].strip() for u in transcripts if u[0].strip()]This change ensures that leading and trailing whitespace is removed from all utterances, not just used for filtering.
Line range hint
1-673
: General suggestions for code improvementWhile not directly related to the changes in this PR, here are some suggestions to improve the overall code quality and maintainability:
Consider breaking down the
OVOSDinkumVoiceService
class into smaller, more focused classes. This class seems to have many responsibilities and is quite large.The
_pet_the_dog
method name is not very descriptive. Consider renaming it to something like_update_watchdog_timer
.Some methods, like
_save_ww
,_save_stt
, and_save_recording
, share similar logic. Consider creating a generic_save_audio
method to reduce code duplication.The error handling in methods like
_hotword_audio
and_stt_audio
could be more specific. Instead of catching all exceptions, catch and handle specific exceptions that you expect might occur.Consider using type hints more consistently throughout the code to improve readability and catch potential type-related errors early.
The
reload_configuration
method is quite long and complex. Consider breaking it down into smaller, more focused methods.These suggestions are not critical for the current changes but could be considered for future refactoring efforts to improve the overall code quality.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- ovos_dinkum_listener/service.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (1)
ovos_dinkum_listener/service.py (1)
Line range hint
1-673
: Summary of changes and suggestionsThe changes made to the
_stt_text
method in this PR improve the handling of speech-to-text results by filtering out empty utterances and handling different scenarios based on the listening mode. These changes enhance the robustness of the voice input processing.While not directly related to this PR, some general suggestions for future code improvements have been provided, including breaking down large classes, improving method names, reducing code duplication, and enhancing error handling.
Overall, the changes in this PR are focused and beneficial, improving the functionality of the OVOS Dinkum Listener service.
# "Thank you" # this one can also be valid!! | ||
] | ||
hallucinations = self.config.get("hallucination_list", default_hallucinations) \ | ||
if self.config.get("filter_hallucinations", True) else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j1nx @goldyfruit @builderjer what do you think about this? is it a good thing to enable by default or should i make it False unless changed by user?
the list above was made from just saying wake word and not asking anything afterwards, i sometimes also get a "please subscribe" but far less common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes IMO this should be enabled by default.
Maybe adding "Did you say something?" or "Not sure I heard you" could be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the filtering of hallucinations
a bus message is emitted that a skill could listen for an speak those notifications if desired "recognizer_loop:speech.recognition.unknown"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh! Yeah works for me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
ovos_dinkum_listener/service.py (3)
662-680
: LGTM! Good addition for handling common hallucinations.The
__normtranscripts
method is a valuable addition that addresses the issue of empty string transcriptions and common hallucinations. It effectively filters out misrecognized phrases, improving the overall accuracy of the speech recognition process.A minor suggestion for improvement:
Consider using a set instead of a list for the
hallucinations
variable to improve lookup performance, especially if the list of hallucinations grows larger:hallucinations = set(self.config.get("hallucination_list", default_hallucinations)) \ if self.config.get("filter_hallucinations", True) else set()This change would make the filtering process more efficient, particularly for larger lists of hallucinations.
681-693
: Approved: Improved handling of empty transcriptions.The changes in the
_stt_text
method effectively address the PR objectives by improving the handling of empty utterances. The method now uses the__normtranscripts
function to filter out hallucinations and handles empty transcriptions differently based on the listening mode.For improved clarity, consider adding a comment explaining the difference in handling empty transcriptions in continuous listening mode:
else: LOG.debug("Ignoring empty transcription in continuous listening mode") # In continuous mode, empty transcriptions are expected and don't indicate an errorThis comment would help future developers understand why empty transcriptions are handled differently in continuous mode.
Line range hint
694-724
: Approved: Improved filename generation for saved STT audio.The changes in the
_save_stt
method significantly improve the reliability and flexibility of saving STT audio files. The use of a template-based filename generator allows for easy customization, and the error handling for missing transcriptions prevents potential crashes.For consistency with the rest of the codebase, consider using f-strings instead of the older
.format()
method:return f"file://{wav_path.absolute()}"This change would make the code more consistent with modern Python practices and improve readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- ovos_dinkum_listener/service.py (1 hunks)
- ovos_dinkum_listener/voice_loop/voice_loop.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ovos_dinkum_listener/voice_loop/voice_loop.py (1)
784-787
: Improved logging for transcription results.The addition of logging for raw transcription results is a valuable enhancement. It will help in identifying and debugging issues related to empty string transcriptions, which aligns well with the PR objectives.
ovos_dinkum_listener/service.py (1)
Line range hint
662-724
: Overall: Excellent improvements addressing the PR objectives.The changes in this file effectively address the issue of handling empty string transcriptions and improve the overall robustness of the speech recognition process. The new
__normtranscripts
method, along with the modifications to_stt_text
and_save_stt
, work together to enhance the system's ability to handle various edge cases in speech recognition.Key improvements:
- Filtering of common hallucinations and empty transcriptions.
- Differentiated handling of empty transcriptions based on listening mode.
- More reliable and flexible STT audio file saving process.
These changes align well with the PR objectives and should significantly improve the user experience by preventing unnecessary fallback actions when no input is detected.
closes #147
Summary by CodeRabbit
New Features
Bug Fixes