Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:handle empty string transcriptions #150

Merged
merged 3 commits into from
Oct 21, 2024
Merged

fix:handle empty string transcriptions #150

merged 3 commits into from
Oct 21, 2024

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Oct 21, 2024

closes #147

Summary by CodeRabbit

  • New Features

    • Enhanced speech-to-text processing with configurable filtering of common misrecognized phrases.
    • Improved logging to provide clearer insights into raw transcription results.
  • Bug Fixes

    • Refined handling of transcription processes to ensure accurate filename generation.
    • Streamlined logging output for improved debugging and monitoring of the transcription process.

Copy link
Contributor

coderabbitai bot commented Oct 21, 2024

Walkthrough

The pull request modifies the OVOSDinkumVoiceService class in ovos_dinkum_listener/service.py. Key updates include the refactoring of the _stt_text method, which now calls a new __normtranscripts method to filter out common misrecognized phrases from transcripts. The handling of empty utterances has been refined, emitting a "speech.recognition.unknown" message when appropriate. Additionally, the _save_stt method has been updated for better audio filename generation. The DinkumVoiceLoop class in voice_loop.py has enhanced logging for transcription results.

Changes

File Change Summary
ovos_dinkum_listener/service.py Refactored _stt_text to call __normtranscripts for filtering hallucinations; improved handling of empty utterances and updated _save_stt.
ovos_dinkum_listener/voice_loop/voice_loop.py Enhanced logging in DinkumVoiceLoop for raw transcription results; updated _after_cmd logging.

Assessment against linked issues

Objective Addressed Explanation
Handle empty utterances as "nevermind"
Improve handling of speech-to-text results

🐰 In the meadow where voices play,
A change was made to brighten the day.
Empty words now take a bow,
"Nevermind" they say, and we know how!
With clearer speech, we hop along,
In the world of sound, we sing our song! 🎶


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added fix and removed fix labels Oct 21, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3534215 and 8000c79.

📒 Files selected for processing (1)
  • ovos_dinkum_listener/service.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ovos_dinkum_listener/service.py (2)

662-672: Improved handling of empty transcriptions

The changes to the _stt_text method enhance the handling of transcriptions:

  1. Empty strings are now filtered out from the transcripts.
  2. The method now checks if there are any valid utterances before emitting the message.
  3. If no valid utterances are found, it checks the listening mode before emitting an "unknown" message.

These changes improve the robustness of the speech recognition process by avoiding the processing of empty or invalid transcriptions.


662-672: Consider impact on other components

While the changes to the _stt_text method are improvements, it's important to ensure that other parts of the system that may depend on this method's behavior are aware of these changes. Specifically:

  1. Components expecting empty string utterances might need to be updated.
  2. The new behavior of not emitting a message for empty utterances in continuous listening mode could affect downstream processes.

To ensure these changes don't introduce unexpected behavior, please run the following verification:

This will help identify any areas of the codebase that might be affected by these changes.

ovos_dinkum_listener/service.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added fix and removed fix labels Oct 21, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
ovos_dinkum_listener/service.py (2)

662-673: Improved handling of STT results

The changes to the _stt_text method enhance the handling of speech-to-text results:

  1. Empty utterances are now filtered out using a list comprehension.
  2. The method now checks if there are any valid utterances before emitting the "recognizer_loop:utterance" message.
  3. If there are no valid utterances, it handles the case differently based on the listening mode.

These changes improve the robustness of the voice input processing.

However, there's a minor optimization that could be made:

-        utts = [u[0] for u in transcripts if u[0].strip()]
+        utts = [u[0].strip() for u in transcripts if u[0].strip()]

This change ensures that leading and trailing whitespace is removed from all utterances, not just used for filtering.


Line range hint 1-673: General suggestions for code improvement

While not directly related to the changes in this PR, here are some suggestions to improve the overall code quality and maintainability:

  1. Consider breaking down the OVOSDinkumVoiceService class into smaller, more focused classes. This class seems to have many responsibilities and is quite large.

  2. The _pet_the_dog method name is not very descriptive. Consider renaming it to something like _update_watchdog_timer.

  3. Some methods, like _save_ww, _save_stt, and _save_recording, share similar logic. Consider creating a generic _save_audio method to reduce code duplication.

  4. The error handling in methods like _hotword_audio and _stt_audio could be more specific. Instead of catching all exceptions, catch and handle specific exceptions that you expect might occur.

  5. Consider using type hints more consistently throughout the code to improve readability and catch potential type-related errors early.

  6. The reload_configuration method is quite long and complex. Consider breaking it down into smaller, more focused methods.

These suggestions are not critical for the current changes but could be considered for future refactoring efforts to improve the overall code quality.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8000c79 and 45e37d1.

📒 Files selected for processing (1)
  • ovos_dinkum_listener/service.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (1)
ovos_dinkum_listener/service.py (1)

Line range hint 1-673: Summary of changes and suggestions

The changes made to the _stt_text method in this PR improve the handling of speech-to-text results by filtering out empty utterances and handling different scenarios based on the listening mode. These changes enhance the robustness of the voice input processing.

While not directly related to this PR, some general suggestions for future code improvements have been provided, including breaking down large classes, improving method names, reducing code duplication, and enhancing error handling.

Overall, the changes in this PR are focused and beneficial, improving the functionality of the OVOS Dinkum Listener service.

@github-actions github-actions bot added fix and removed fix labels Oct 21, 2024
# "Thank you" # this one can also be valid!!
]
hallucinations = self.config.get("hallucination_list", default_hallucinations) \
if self.config.get("filter_hallucinations", True) else []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j1nx @goldyfruit @builderjer what do you think about this? is it a good thing to enable by default or should i make it False unless changed by user?

the list above was made from just saying wake word and not asking anything afterwards, i sometimes also get a "please subscribe" but far less common

Copy link
Member

@goldyfruit goldyfruit Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes IMO this should be enabled by default.
Maybe adding "Did you say something?" or "Not sure I heard you" could be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the filtering of hallucinations

a bus message is emitted that a skill could listen for an speak those notifications if desired "recognizer_loop:speech.recognition.unknown"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh! Yeah works for me too.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
ovos_dinkum_listener/service.py (3)

662-680: LGTM! Good addition for handling common hallucinations.

The __normtranscripts method is a valuable addition that addresses the issue of empty string transcriptions and common hallucinations. It effectively filters out misrecognized phrases, improving the overall accuracy of the speech recognition process.

A minor suggestion for improvement:

Consider using a set instead of a list for the hallucinations variable to improve lookup performance, especially if the list of hallucinations grows larger:

hallucinations = set(self.config.get("hallucination_list", default_hallucinations)) \
    if self.config.get("filter_hallucinations", True) else set()

This change would make the filtering process more efficient, particularly for larger lists of hallucinations.


681-693: Approved: Improved handling of empty transcriptions.

The changes in the _stt_text method effectively address the PR objectives by improving the handling of empty utterances. The method now uses the __normtranscripts function to filter out hallucinations and handles empty transcriptions differently based on the listening mode.

For improved clarity, consider adding a comment explaining the difference in handling empty transcriptions in continuous listening mode:

else:
    LOG.debug("Ignoring empty transcription in continuous listening mode")
    # In continuous mode, empty transcriptions are expected and don't indicate an error

This comment would help future developers understand why empty transcriptions are handled differently in continuous mode.


Line range hint 694-724: Approved: Improved filename generation for saved STT audio.

The changes in the _save_stt method significantly improve the reliability and flexibility of saving STT audio files. The use of a template-based filename generator allows for easy customization, and the error handling for missing transcriptions prevents potential crashes.

For consistency with the rest of the codebase, consider using f-strings instead of the older .format() method:

return f"file://{wav_path.absolute()}"

This change would make the code more consistent with modern Python practices and improve readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 45e37d1 and 51bf8ef.

📒 Files selected for processing (2)
  • ovos_dinkum_listener/service.py (1 hunks)
  • ovos_dinkum_listener/voice_loop/voice_loop.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ovos_dinkum_listener/voice_loop/voice_loop.py (1)

784-787: Improved logging for transcription results.

The addition of logging for raw transcription results is a valuable enhancement. It will help in identifying and debugging issues related to empty string transcriptions, which aligns well with the PR objectives.

ovos_dinkum_listener/service.py (1)

Line range hint 662-724: Overall: Excellent improvements addressing the PR objectives.

The changes in this file effectively address the issue of handling empty string transcriptions and improve the overall robustness of the speech recognition process. The new __normtranscripts method, along with the modifications to _stt_text and _save_stt, work together to enhance the system's ability to handle various edge cases in speech recognition.

Key improvements:

  1. Filtering of common hallucinations and empty transcriptions.
  2. Differentiated handling of empty transcriptions based on listening mode.
  3. More reliable and flexible STT audio file saving process.

These changes align well with the PR objectives and should significantly improve the user experience by preventing unnecessary fallback actions when no input is detected.

ovos_dinkum_listener/voice_loop/voice_loop.py Show resolved Hide resolved
@JarbasAl JarbasAl merged commit 1983698 into dev Oct 21, 2024
10 checks passed
@JarbasAl JarbasAl deleted the fix/empty_utts branch October 21, 2024 19:43
@coderabbitai coderabbitai bot mentioned this pull request Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty utterance error
2 participants