Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:standardize_lang #568

Merged
merged 13 commits into from
Oct 16, 2024
Merged

fix:standardize_lang #568

merged 13 commits into from
Oct 16, 2024

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Oct 16, 2024

makes intents match regardless of upper or lowercase being used in lang code, and brings initial support for dynamic dialects in the pipeline plugins

companion PRs:

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced language handling across various services for improved consistency.
  • Bug Fixes

    • Corrected language code formatting in multiple test cases to ensure uniformity.
  • Chores

    • Updated dependency versions for improved stability and performance.

@github-actions github-actions bot added fix and removed fix labels Oct 16, 2024
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 96.96970% with 2 lines in your changes missing coverage. Please review.

Project coverage is 75.41%. Comparing base (23f0bab) to head (0f1b010).
Report is 39 commits behind head on dev.

Files with missing lines Patch % Lines
ovos_core/intent_services/converse_service.py 96.29% 1 Missing ⚠️
ovos_core/intent_services/stop_service.py 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #568      +/-   ##
==========================================
+ Coverage   75.33%   75.41%   +0.07%     
==========================================
  Files          15       15              
  Lines        3094     1570    -1524     
==========================================
- Hits         2331     1184    -1147     
+ Misses        763      386     -377     
Flag Coverage Δ
end2end 55.73% <96.96%> (?)
unittests 52.92% <87.87%> (-22.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added fix and removed fix labels Oct 16, 2024
Copy link

coderabbitai bot commented Oct 16, 2024

Walkthrough

The pull request introduces modifications to the IntentService, ConverseService, StopService, and FallbackService classes, focusing on improvements in language handling and the deprecation of direct property access. Key changes include the implementation of standardize_lang_tag for consistent language tag formatting and updates to dependency versions in the requirements files. Additionally, several test files have been adjusted to ensure consistent language code formatting in tests, while maintaining the overall structure and logic of the tests.

Changes

File(s) Change Summary
ovos_core/intent_services/__init__.py, ovos_core/intent_services/converse_service.py, ovos_core/intent_services/stop_service.py, ovos_core/intent_services/fallback_service.py Updated language handling with standardize_lang_tag, deprecated direct property access, and modified internal logic.
test/unittests/test_intent_service.py, test/end2end/session/test_blacklist.py, test/end2end/session/test_complete_failure.py, test/end2end/session/test_converse.py, test/end2end/session/test_fallback.py, test/end2end/session/test_get_response.py, test/end2end/session/test_sched.py, test/end2end/session/test_session.py, test/end2end/session/test_stop.py, test/end2end/session/test_transformers.py Adjusted tests to ensure consistent language code formatting and adherence to standardized conventions.
requirements/lgpl.txt, requirements/requirements.txt, requirements/plugins.txt Updated version constraints for multiple dependencies to newer versions while maintaining upper bounds.

Possibly related PRs

  • feat/pipeline_plugins_opm #527: The changes in ovos_core/intent_services/__init__.py and ovos_core/intent_services/converse_service.py are related to the modifications in the IntentService and ConverseService classes, which both involve updates to service initialization and import paths.
  • improve_typing #528: The updates in ovos_core/intent_services/__init__.py, adapt_service.py, commonqa_service.py, converse_service.py, and fallback_service.py reflect similar changes in handling the IntentMatch class, which is also addressed in the main PR.
  • plugin extras #541: The introduction of the requirements/plugins.txt file and the changes in setup.py to manage plugin dependencies are relevant as they align with the main PR's focus on enhancing the plugin-based architecture.
  • deprecate ready setting #554: The modifications in ovos_core/skill_manager.py and related files indicate a broader restructuring of service management, which complements the changes in service handling in the main PR.
  • fix:log_spam #566: The updates in ovos_core/intent_services/__init__.py regarding deprecation logging for properties are directly related to the changes in the main PR that also focus on deprecating direct property access in the IntentService.

Suggested labels

feature

Suggested reviewers

  • goldyfruit
  • j1nx
  • builderjer

Poem

In the meadow where bunnies play,
Language tags dance and sway.
With each hop, a new tune,
Standardized by the light of the moon.
Services bloom, old ways depart,
A fresh start for every heart! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3c89055 and 7ee49c3.

📒 Files selected for processing (4)
  • ovos_core/intent_services/fallback_service.py (2 hunks)
  • ovos_core/intent_services/stop_service.py (7 hunks)
  • requirements/plugins.txt (1 hunks)
  • requirements/requirements.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ovos_core/intent_services/stop_service.py
  • requirements/requirements.txt
🧰 Additional context used
🔇 Additional comments (4)
requirements/plugins.txt (1)

2-2: LGTM! Minor version bump for ovos-utterance-plugin-cancel.

The update from >=0.2.0 to >=0.2.2 for ovos-utterance-plugin-cancel is a minor version bump, which typically indicates backward-compatible new features or bug fixes. This change aligns with semantic versioning principles and is consistent with the PR objectives of standardizing language handling and supporting dynamic dialects.

To ensure compatibility, please verify that this update has been tested with the current codebase. Consider running the following command to check for any potential issues:

ovos_core/intent_services/fallback_service.py (3)

25-25: LGTM: New import statement added correctly.

The import of standardize_lang_tag from ovos_utils.lang is correctly placed and necessary for the changes in the _fallback_range method.


174-177: Excellent improvement in language handling and type annotation.

The changes in the _fallback_range method are well-implemented:

  1. The addition of the return type annotation -> Optional[IntentMatch] improves code readability and enables better type checking.
  2. The use of standardize_lang_tag(lang) ensures consistent formatting of language tags, which is crucial for proper language handling across the system.

These changes align well with the PR objective of standardizing language inputs and improve the overall robustness of the fallback system.


Line range hint 1-218: Summary: Effective implementation of language standardization.

The changes made to FallbackService class are minimal yet impactful:

  1. The addition of standardize_lang_tag import and its usage in the _fallback_range method ensures consistent language tag formatting throughout the fallback process.
  2. The return type annotation on _fallback_range method improves code clarity and maintainability.

These modifications align well with the PR objectives, enhancing the robustness of language handling in the fallback system without disrupting the existing functionality. The changes are well-integrated into the existing codebase and maintain the overall architecture of the FallbackService class.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added fix and removed fix labels Oct 16, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (10)
test/end2end/routing/test_sched.py (1)

Line range hint 14-101: Consider refactoring the test method for improved maintainability.

While not directly related to the current change, the test_no_session method is quite long and complex. Consider the following improvements for future refactoring:

  1. Split the method into smaller, more focused test cases.
  2. Replace the custom wait mechanism with a more robust solution, possibly using a library designed for asynchronous testing.
  3. Consider extracting the list of expected messages to a separate constant or configuration file for easier maintenance.

These suggestions are meant for future consideration and are not blocking issues for the current PR.

test/end2end/session/test_sched.py (1)

Inconsistent Language Code Usage Detected

The verification script identified multiple instances of the lowercase language code "en-us" across the codebase. These inconsistencies need to be addressed to ensure uniformity in language handling. The affected files include:

  • test/integrationtests/test_workshop.py
  • test/unittests/xformers.py
  • test/unittests/test_intent_service.py
  • test/integrationtests/test_selene_api.py
  • test/integrationtests/common_query/test_continuous_dialog.py
  • test/end2end/session/test_stop.py
  • test/end2end/session/test_session.py
  • test/end2end/session/test_transformers.py
  • test/end2end/session/test_complete_failure.py
  • test/end2end/session/test_converse.py
  • test/backwards_compat/test_ocp.py
  • test/end2end/routing/test_sched.py
  • test/end2end/session/test_blacklist.py
  • test/end2end/session/test_get_response.py
  • mycroft/deprecated/enclosure/mark1/__init__.py
  • mycroft/dialog/dialog.py
  • ovos_core/intent_services/stop_service.py

Please update all instances of "en-us" to the standardized "en-US" to maintain consistency across the project.

🔗 Analysis chain

Line range hint 1-240: Summary: Successful standardization of language codes

The changes in this file consistently update the language code from "en-us" to "en-US" across both test methods (test_no_session and test_explicit_session). This standardization affects:

  1. Default session language settings
  2. Assertions for message contexts
  3. Assertions for 'speak' message data

These modifications align with the PR objective of ensuring that intents match regardless of the case used in language codes. The changes maintain the overall structure and logic of the tests while enhancing the verification of standardized language codes.

To ensure that these changes are consistent across the entire codebase, please run the following script:

This script will help identify any inconsistencies in language code usage across the project.


To address the issue with the previous script, please run the updated verification script below:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of standardized language codes across the codebase

# Search for any remaining instances of lowercase 'us' in language codes
echo "Searching for lowercase 'us' in language codes:"
rg -i '\ben-us\b' --type python

# Search for the new standardized format to confirm its usage
echo "Confirming usage of standardized 'en-US' format:"
rg '\ben-US\b' --type python

# Check for any mixed case variations that might have been missed
echo "Checking for any mixed case variations:"
rg -i '\ben-[uU][sS]\b' --type python | grep -v 'en-US'

Length of output: 526


Script:

#!/bin/bash
# Description: Verify the usage of standardized language codes across the codebase

# Search for any remaining instances of lowercase 'us' in language codes
echo "Searching for lowercase 'us' in language codes:"
rg -i '\ben-us\b' --glob '*.py'

# Search for the new standardized format to confirm its usage
echo "Confirming usage of standardized 'en-US' format:"
rg '\ben-US\b' --glob '*.py'

# Check for any mixed case variations that might have been missed
echo "Checking for any mixed case variations:"
rg -i '\ben-[uU][sS]\b' --glob '*.py' | grep -v 'en-US'

Length of output: 14923

ovos_core/intent_services/__init__.py (8)

213-215: LGTM: Standardized language tag usage.

The use of standardize_lang_tag(get_message_lang()) ensures consistent language tag formatting. Consider extracting this to a separate line for improved readability:

-        lang = standardize_lang_tag(get_message_lang())
-        return [parser.__dict__
-                for parser in self._adapt_service.engines[lang].intent_parsers]
+        lang = standardize_lang_tag(get_message_lang())
+        intent_parsers = self._adapt_service.engines[lang].intent_parsers
+        return [parser.__dict__ for parser in intent_parsers]

This change would make the code slightly more readable and easier to debug if needed.


Line range hint 267-274: LGTM: Improved language handling in transformers.

The use of standardize_lang_tag and explicit setting of the language in the message context improves consistency. Consider adding error handling for the transformation process:

         utterances, message.context = self.utterance_plugins.transform(utterances, message.context)
         if original != utterances:
             message.data["utterances"] = utterances
             LOG.debug(f"utterances transformed: {original} -> {utterances}")
-        message.context = self.metadata_plugins.transform(message.context)
+        try:
+            message.context = self.metadata_plugins.transform(message.context)
+        except Exception as e:
+            LOG.error(f"Error during metadata transformation: {e}")
         return message

This change would make the method more robust by catching and logging any errors that might occur during the metadata transformation process.


Line range hint 285-300: LGTM: Consistent language standardization in disambiguation.

The use of standardize_lang_tag for the default language is appropriate. Consider adding more detailed logging for better debugging:

         for k in lang_keys:
             if k in message.context:
                 v = get_full_lang_code(message.context[k])
                 if v in valid_langs:
                     if v != default_lang:
-                        LOG.info(f"replaced {default_lang} with {k}: {v}")
+                        LOG.info(f"Language disambiguated: replaced {default_lang} with {k}: {v}")
                     return v
                 else:
-                    LOG.warning(f"ignoring {k}, {v} is not in enabled languages: {valid_langs}")
+                    LOG.warning(f"Ignoring {k}: {v} is not in enabled languages: {valid_langs}")

These changes provide more context in the log messages, which can be helpful for debugging language-related issues.


356-358: LGTM: Consistent language standardization in session validation.

The use of standardize_lang_tag(lang) at the beginning of the method ensures consistent language tag formatting. Consider extracting the standardized language to a named variable for improved readability:

-        lang = standardize_lang_tag(lang)
-        sess = SessionManager.get(message)
-        if sess.session_id == "default":
+        standardized_lang = standardize_lang_tag(lang)
+        sess = SessionManager.get(message)
+        if sess.session_id == "default":

This change makes it clearer that we're working with a standardized language tag throughout the method.


530-533: LGTM: Consistent language standardization in vocabulary registration.

The use of standardize_lang_tag(get_message_lang()) ensures consistent language tag formatting. Consider adding error handling for the vocabulary registration process:

         lang = standardize_lang_tag(get_message_lang())
-        self._adapt_service.register_vocabulary(entity_value, entity_type,
-                                               alias_of, regex_str, lang)
-        self.registered_vocab.append(message.data)
+        try:
+            self._adapt_service.register_vocabulary(entity_value, entity_type,
+                                                    alias_of, regex_str, lang)
+            self.registered_vocab.append(message.data)
+        except Exception as e:
+            LOG.error(f"Error registering vocabulary: {e}")

This change would make the method more robust by catching and logging any errors that might occur during the vocabulary registration process.


607-609: LGTM: Consistent language standardization in intent handling.

The use of standardize_lang_tag(get_message_lang()) ensures consistent language tag formatting. Consider adding more detailed logging for better debugging:

-        lang = standardize_lang_tag(get_message_lang())
+        lang = standardize_lang_tag(get_message_lang())
+        LOG.debug(f"Handling get_intent for utterance: '{utterance}' in language: {lang}")
         sess = SessionManager.get(message)

This change provides more context in the log messages, which can be helpful for debugging intent-related issues.


Line range hint 658-662: LGTM: Consistent language standardization in Adapt intent handling.

The use of standardize_lang_tag(get_message_lang()) ensures consistent language tag formatting. Consider adding error handling for the intent matching process:

         lang = standardize_lang_tag(get_message_lang())
-        intent = self._adapt_service.match_intent((utterance,), lang, message.serialize())
-        intent_data = intent.intent_data if intent else None
-        self.bus.emit(message.reply("intent.service.adapt.reply",
-                                    {"intent": intent_data}))
+        try:
+            intent = self._adapt_service.match_intent((utterance,), lang, message.serialize())
+            intent_data = intent.intent_data if intent else None
+            self.bus.emit(message.reply("intent.service.adapt.reply",
+                                        {"intent": intent_data}))
+        except Exception as e:
+            LOG.error(f"Error matching Adapt intent: {e}")
+            self.bus.emit(message.reply("intent.service.adapt.reply",
+                                        {"intent": None, "error": str(e)}))

This change would make the method more robust by catching and logging any errors that might occur during the Adapt intent matching process, and also inform the caller about the error.


Line range hint 89-157: LGTM: Consistent deprecation warnings for properties.

The deprecation warnings for properties and their setters are well-implemented. For consistency, consider using a constant for the deprecation version:

+DEPRECATION_VERSION = "1.0.0"

 @property
 def adapt_service(self):
     log_deprecation("direct access to self.adapt_service is deprecated, "
-                    "pipelines are in the progress of being replaced with plugins", "1.0.0")
+                    "pipelines are in the progress of being replaced with plugins", DEPRECATION_VERSION)
     return self._adapt_service

 # Apply similar changes to other properties and setters

This change would make it easier to update the deprecation version across all warnings in the future.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f704fbf and ca7b0e0.

📒 Files selected for processing (16)
  • ovos_core/intent_services/init.py (8 hunks)
  • ovos_core/intent_services/converse_service.py (3 hunks)
  • ovos_core/intent_services/stop_service.py (6 hunks)
  • requirements/lgpl.txt (1 hunks)
  • requirements/requirements.txt (1 hunks)
  • test/end2end/routing/test_sched.py (1 hunks)
  • test/end2end/routing/test_session.py (1 hunks)
  • test/end2end/session/test_blacklist.py (3 hunks)
  • test/end2end/session/test_complete_failure.py (4 hunks)
  • test/end2end/session/test_converse.py (5 hunks)
  • test/end2end/session/test_fallback.py (3 hunks)
  • test/end2end/session/test_get_response.py (15 hunks)
  • test/end2end/session/test_sched.py (7 hunks)
  • test/end2end/session/test_session.py (5 hunks)
  • test/end2end/session/test_stop.py (2 hunks)
  • test/end2end/session/test_transformers.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (65)
requirements/lgpl.txt (1)

1-1: LGTM! Verify the new version's availability and contents.

The update to ovos_padatious>=0.1.3,<1.0.0 is consistent with the PR objectives and likely includes necessary changes for standardizing language handling. This minor version bump suggests new features or non-breaking improvements.

Please ensure that:

  1. Version 0.1.3 of ovos_padatious is available in the package repository.
  2. The changes in version 0.1.3 include the required updates for standardizing language code handling.

You can verify this by running:

requirements/requirements.txt (7)

6-6: LGTM: padacioso version update

The update to padacioso>=0.2.4,<1.0.0 is appropriate. This minor version bump likely includes improvements related to the standardization of language handling, which aligns with the PR objectives.


7-7: LGTM: ovos-adapt-parser version update

The update to ovos-adapt-parser>=0.1.3,<1.0.0 is appropriate. This patch version bump likely includes bug fixes or minor improvements related to the standardization of language handling, which aligns with the PR objectives.


8-8: LGTM: ovos_ocp_pipeline_plugin version update

The update to ovos_ocp_pipeline_plugin>=0.1.3,<1.0.0 is appropriate. This patch version bump likely includes changes related to the standardization of language handling. This update corresponds to the companion PR ovos-ocp-pipeline-plugin#12 mentioned in the PR objectives.


9-9: LGTM: ovos-common-query-pipeline-plugin version update

The update to ovos-common-query-pipeline-plugin>=0.1.4,<1.0.0 is appropriate. This minor version bump likely includes new features or improvements related to the standardization of language handling. This update corresponds to the companion PR ovos-common-query-pipeline-plugin#9 mentioned in the PR objectives.


11-11: LGTM: ovos-utils version update

The update to ovos-utils>=0.3.5,<1.0.0 is appropriate. This significant minor version bump indicates substantial new features or improvements, likely related to the standardization of language handling. This update corresponds to the companion PRs ovos-utils#281 and ovos-utils#283 mentioned in the PR objectives.


12-12: LGTM: ovos_bus_client version update

The update to ovos_bus_client>=0.1.4,<1.0.0 is appropriate. This patch version bump likely includes bug fixes or minor improvements related to the standardization of language handling. This update corresponds to the companion PR ovos-bus-client#128 mentioned in the PR objectives.


6-12: Summary: Dependency updates align with PR objectives

The updates to padacioso, ovos-adapt-parser, ovos_ocp_pipeline_plugin, ovos-common-query-pipeline-plugin, ovos-utils, and ovos_bus_client are all appropriate and align with the PR objectives. These updates contribute to the standardization of language handling across the OpenVoiceOS ecosystem while maintaining compatibility within the current major versions. The changes correspond to the companion PRs mentioned in the PR objectives, ensuring a coordinated update across multiple repositories.

test/end2end/routing/test_sched.py (1)

19-19: Approved: Language code standardization aligns with PR objectives.

The change from "en-us" to "en-US" for the default session language is consistent with the pull request's objective of standardizing language handling. This modification ensures that the test uses the correct format for language codes, which is crucial for maintaining consistency across the codebase.

test/end2end/session/test_transformers.py (3)

35-35: Approved: Language code standardization

The change from "en-us" to "en-US" standardizes the language code format, aligning with the BCP 47 language tag standard. This modification supports the PR's objective of ensuring intents match regardless of the case used in language codes.


108-108: Approved: Consistent language code standardization

The change from "en-us" to "en-US" in the test_meta method maintains consistency with the earlier change in the test_cancel method. This ensures uniform language code formatting across test methods, supporting the PR's goal of standardizing language handling.


Line range hint 1-168: Verify test execution with standardized language codes

The changes to this file are minimal and focused, updating the language code format from "en-us" to "en-US" in two test methods. These modifications align with the PR's objective of standardizing language inputs and should not affect the overall functionality of the tests.

To ensure the changes haven't inadvertently impacted test behavior:

This will help confirm that the tests still pass with the standardized language codes.

test/end2end/routing/test_session.py (1)

24-24: Approve: Standardized language code format

This change correctly updates the language code from "en-us" to "en-US", adhering to the ISO 639-1 and ISO 3166-1 alpha-2 standards. This modification aligns with the PR's objective of standardizing language inputs and ensures that the tests accurately reflect the expected format for language codes in the system.

The standardization is crucial for:

  1. Consistency across the codebase
  2. Accurate testing of language-dependent features
  3. Potential identification of case-sensitivity issues in language code handling

This change contributes to more robust and reliable tests, which is essential for maintaining the quality of the language processing capabilities in the OpenVoiceOS ecosystem.

test/end2end/session/test_sched.py (6)

19-19: LGTM: Language code standardization

The change from "en-us" to "en-US" aligns with the PR objective of standardizing language codes and follows the ISO 639-1 standard.


77-77: LGTM: Consistent language code in assertion

The assertion has been updated to use "en-US", maintaining consistency with the earlier change and ensuring that the test correctly validates the standardized language code.


97-97: LGTM: Standardized language code in 'speak' message assertion

The assertion for the 'speak' message has been updated to use "en-US", maintaining consistency with the standardized language code throughout the test case.


125-125: LGTM: Consistent language code in second 'speak' message assertion

The assertion for the second 'speak' message has been updated to use "en-US", ensuring consistency with the standardized language code throughout the test case.


134-134: LGTM: Standardized language code in test_explicit_session

The default session language code has been updated to "en-US" in the test_explicit_session method, maintaining consistency with the test_no_session method and adhering to the standardized language code format.


215-215: LGTM: Consistent language code in test_explicit_session 'speak' message assertions

The assertions for 'speak' messages in the test_explicit_session method have been updated to use "en-US", ensuring consistency with the standardized language code throughout both test methods.

Also applies to: 232-232

ovos_core/intent_services/stop_service.py (6)

13-13: LGTM: Import of standardize_lang_tag function

The addition of standardize_lang_tag from ovos_utils.lang aligns with the PR's objective to standardize language handling. This import will help ensure consistent formatting of language tags throughout the code.


131-131: LGTM: Consistent language tag standardization

The use of standardize_lang_tag(lang) in the match_stop_high method maintains consistency with the earlier changes and ensures that the language tag used for vocabulary matching is standardized.


182-182: LGTM: Consistent language tag standardization in match_stop_medium

The use of standardize_lang_tag(lang) in the match_stop_medium method maintains consistency with the earlier changes and ensures that the language tag used for vocabulary matching is standardized across all stop-related methods.


209-209: LGTM: Consistent language tag standardization in match_stop_low

The use of standardize_lang_tag(lang) in the match_stop_low method maintains consistency with the earlier changes and ensures that the language tag used for vocabulary matching is standardized across all stop-related methods.


Line range hint 1-286: Summary: Standardized language handling implemented consistently

The changes in this file successfully implement standardized language handling using the standardize_lang_tag function across all relevant methods. This aligns well with the PR objectives and should improve the robustness of language-related operations in the StopService class.

To ensure the changes work correctly and maintain backwards compatibility, please consider the following:

  1. Add comprehensive tests covering various language tag formats (e.g., 'en', 'en-US', 'pt-BR') to verify that all stop-related functionality works as expected.
  2. Verify that these changes don't introduce any breaking changes for existing implementations that might rely on the previous language tag format.
  3. Update the documentation to reflect the new standardized language handling, if applicable.

You can use the following script to identify potential areas that might need additional testing or documentation updates:

#!/bin/bash
# Description: Identify areas for potential testing and documentation updates

# Search for language-related functionality in the codebase
rg --type python 'lang.*=|language.*=' ovos_core/

# Search for existing language-related tests
rg --type python 'def test_.*lang' ovos_core/test/

# Search for language-related documentation
rg --type markdown 'language|lang' docs/

270-270: LGTM: Consistent language tag standardization in voc_match

The use of standardize_lang_tag(lang) in the voc_match method maintains consistency with the earlier changes and ensures that the language tag used for vocabulary matching is standardized across all methods that interact with the vocabulary cache.

To ensure the changes work correctly across different language codes and dialects, please add comprehensive tests that cover various language tag formats (e.g., 'en', 'en-US', 'pt-BR') and verify that the vocabulary matching works as expected for all cases. You can use the following script to check the current test coverage for language handling:

#!/bin/bash
# Description: Check test coverage for language handling in StopService

# Search for test files related to StopService
rg --type python 'class.*Test.*StopService' ovos_core/test/

# Search for language-related test cases within these files
rg --type python '(def test_.*lang|lang.*=)' ovos_core/test/ | grep -i stop
test/end2end/session/test_complete_failure.py (5)

22-22: Approved: Standardized language code format

The change from "en-us" to "en-US" for SessionManager.default_session.lang is correct and aligns with the PR objective of standardizing language handling. This format follows the ISO 639-1 (language code) and ISO 3166-1 (country code) standards, ensuring consistency across the codebase.


108-108: Approved: Consistent language code standardization

The change from "en-us" to "en-US" for SessionManager.default_session.lang in this test method is consistent with the previous change. This ensures that language code standardization is applied uniformly across different test scenarios.


147-147: Approved: Comprehensive language code standardization

The update of the valid_languages list to include "en-US" instead of "en-us" is consistent with the overall standardization effort. This change ensures that all references to language codes within the test suite follow the same format.


Line range hint 1-250: Summary of changes and recommendations

  1. The changes in this file consistently standardize language codes from "en-us" to "en-US", which aligns with the PR objectives and improves adherence to ISO standards.

  2. The standardization is applied comprehensively, including the valid_languages list and individual language assignments.

  3. A potential issue was identified in the test_complete_failure_lang_detect method, where the session language is reset at the end of the test. This might interfere with verifying the persistence of language changes.

Recommendation: Review the test logic, particularly the final language reset, to ensure it doesn't compromise the test's ability to verify language change persistence.

Overall, the changes improve code consistency and standardization, contributing positively to the project's language handling capabilities.


250-250: ⚠️ Potential issue

Consider removing or updating the final language reset

While the change from "en-us" to "en-US" is correct, there's a potential issue with the test logic:

  1. The test sets the session language to "pt-pt" on line 218.
  2. It then verifies this change on line 249.
  3. Finally, it resets the language to "en-US" on line 250.

This reset might interfere with verifying the persistence of the language change across sessions. Consider either removing this reset or updating the test to verify that the language change persists even after attempting to reset it to a different value.

To verify the impact of this reset, you can run the following script:

This will help determine if there are any assertions or verifications that might be affected by the language reset.

test/end2end/session/test_session.py (5)

23-23: LGTM: Language code standardization

The update from "en-us" to "en-US" aligns with the PR objective of standardizing language codes. This change ensures consistency with the IETF language tag format (ISO 639-1 for language code, ISO 3166-1 alpha-2 for country code, separated by a hyphen).


76-76: LGTM: Consistent language code assertion

The update of the assertion from "en-us" to "en-US" maintains consistency with the earlier change to the default session language. This ensures that the test correctly verifies the standardized language code format throughout the message pipeline.


108-108: LGTM: Consistent language code update

The change from "en-us" to "en-US" for the explicit default session is consistent with the updates made in the test_no_session method. This ensures that all test cases use the standardized language code format, maintaining consistency across the test suite.


209-209: LGTM: Comprehensive language code standardization

The updates from "en-us" to "en-US" in both the default session setup (line 209) and the speak message assertion (line 298) complete the standardization of language codes across all test cases. This comprehensive approach ensures consistency throughout the test suite and aligns with the PR's objective of standardizing language handling.

Also applies to: 298-298


Line range hint 23-298: Summary: Successful standardization of language codes in test suite

The changes made to this test file consistently update the language code from "en-us" to "en-US" across all test methods. This standardization aligns with the PR objective and ensures that the test suite accurately reflects and verifies the expected behavior of the system with regard to language code handling. The updates contribute to maintaining a robust and consistent testing environment for the OpenVoiceOS system.

test/end2end/session/test_stop.py (3)

23-23: Approved: Language code standardization

The change from "en-us" to "en-US" aligns with the PR objective of standardizing language codes. This modification ensures compliance with ISO 639-1 and ISO 3166-1 standards for language and country codes, respectively. It's a small but important change that contributes to consistency across the codebase.


199-199: Approved: Consistent language code standardization

The change from "en-us" to "en-US" in the test_new_stop method is consistent with the earlier modification in test_old_stop. This ensures uniform language code formatting throughout the test file, contributing to the overall objective of standardizing language handling across the codebase.


Line range hint 1-458: Summary: Changes align with PR objectives

The modifications in this file are minimal and precisely targeted at standardizing language codes. Both instances of "en-us" have been updated to "en-US", which is in line with the PR's objective of ensuring intents match regardless of the case used in language codes. These changes contribute to the overall goal of standardizing language handling within the OpenVoiceOS ecosystem.

The updates are consistent and do not introduce any unintended side effects or issues. They maintain the existing test structure while enhancing the accuracy of language code representation.

test/end2end/session/test_fallback.py (4)

22-22: LGTM: Language code standardization

The change from "en-us" to "en-US" aligns with the PR objective of standardizing language codes. This modification ensures consistency across the codebase and more accurately represents real-world language tags.


145-145: LGTM: Consistent language code standardization

This change from "en-us" to "en-US" is consistent with the previous modification and maintains the standardization of language codes throughout the test cases. It improves the overall consistency of the codebase.


319-319: LGTM: Completed language code standardization

This final change from "en-us" to "en-US" completes the standardization of language codes across all test cases in this file. These consistent modifications improve the overall code quality and align with the PR's objective of ensuring intents match regardless of the case used in language codes.


Line range hint 22-319: Summary: Successful language code standardization

The changes in this file consistently update the language code from "en-us" to "en-US" across all test cases. These modifications:

  1. Align with the PR objective of standardizing language code handling.
  2. Improve code consistency without altering the test logic.
  3. Contribute to ensuring intents match regardless of the case used in language codes.

These changes are part of a larger effort to standardize language handling across the OpenVoiceOS ecosystem, as evidenced by the companion pull requests mentioned in the PR objectives.

test/end2end/session/test_blacklist.py (4)

24-24: Approved: Language code standardization

The change from "en-us" to "en-US" aligns with the PR objective of standardizing language inputs. This modification ensures consistency with the IETF language tag format (BCP 47), which is the standard for language codes.


312-312: Approved: Consistent language code standardization

The change from "en-us" to "en-US" in the TestFallback class is consistent with the previous modification in the TestSessions class. This change further contributes to the standardization of language inputs across the codebase, aligning with the PR objectives.


403-403: Approved: Completed language code standardization

The change from "en-us" to "en-US" in the TestCommonQuery class completes the standardization of language inputs across all test classes in this file. This consistent approach ensures that all tests use the correct IETF language tag format, aligning with the PR objectives and improving the overall quality of the test suite.


Line range hint 24-403: Summary: Successful standardization of language codes

The changes in this file consistently update the language code from "en-us" to "en-US" across all test classes (TestSessions, TestFallback, and TestCommonQuery). These modifications:

  1. Align with the PR objective of standardizing language inputs.
  2. Ensure consistency with the IETF language tag format (BCP 47).
  3. Improve the overall quality and consistency of the test suite.

These changes contribute to more robust and standardized language handling in the OpenVoiceOS ecosystem, which should help prevent potential issues related to inconsistent language code formatting.

ovos_core/intent_services/converse_service.py (4)

12-12: LGTM: Import statement for standardize_lang_tag added correctly.

The import of standardize_lang_tag from ovos_utils.lang is correctly placed and necessary for the subsequent usage in the class methods.


283-283: LGTM: Language tag standardization implemented in converse method.

The use of standardize_lang_tag(lang) ensures consistent formatting of language tags before they are used in the session. This change aligns with the PR objective of standardizing language handling across the system.


388-388: LGTM: Language tag standardization implemented in reset_converse method.

The use of standardize_lang_tag(get_message_lang()) ensures consistent formatting of language tags when resetting the conversation. This change aligns with the PR objective and is consistent with the modification in the converse method.


Line range hint 1-413: Summary: Successful implementation of language tag standardization.

The changes in this file successfully implement the standardization of language tags in the ConverseService class. The modifications are minimal, focused, and consistent, affecting only the necessary parts of the code. These changes align well with the PR objectives and should improve the consistency of language handling across the system.

Key points:

  1. Proper import of standardize_lang_tag function.
  2. Consistent usage in both converse and reset_converse methods.
  3. No unintended alterations to the class structure or functionality.

The implementation looks solid and ready for integration.

test/end2end/session/test_converse.py (5)

24-24: Approved: Standardized language code format

The change from "en-us" to "en-US" standardizes the language code format, aligning it with ISO 639-1 and ISO 3166-1 alpha-2 standards. This modification supports the PR's objective of ensuring consistent language handling across the system.


88-88: Approved: Consistent language code format in assertions

This change maintains consistency with the earlier modification, updating the assertion to check for the standardized "en-US" language code. It ensures that the tests accurately verify the correct language code format throughout the system.


253-253: Approved: Maintaining consistency in language code assertions

This change continues the pattern of updating language code assertions from "en-us" to "en-US". It's crucial for maintaining consistency across all test assertions, ensuring that the entire test suite is aligned with the standardized language code format.


342-342: Approved: Completed standardization of language code assertions

This final change to "en-US" completes the standardization of language code assertions throughout the test file. These consistent updates ensure that all parts of the test suite are aligned with the new standardized language handling, supporting more reliable and accurate testing of language-related functionality.


Line range hint 24-342: Summary: Successful standardization of language codes in test file

The changes made to this test file consistently update the language code format from "en-us" to "en-US" across all assertions. This standardization aligns with the PR's objective of ensuring consistent language handling throughout the system. The modifications improve the accuracy and reliability of the tests without altering the core test logic or structure. These changes contribute to a more robust testing environment for language-related functionality in the OpenVoiceOS system.

ovos_core/intent_services/__init__.py (2)

32-32: LGTM: New import for language standardization.

The addition of standardize_lang_tag from ovos_utils.lang is appropriate for the changes made throughout the file to standardize language handling.


Line range hint 1-758: Overall assessment: Improved language handling and API deprecation.

The changes made to this file significantly improve the consistency of language handling by using standardize_lang_tag throughout the IntentService class. The addition of deprecation warnings for several properties is a good practice for maintaining backwards compatibility while encouraging users to migrate to new APIs.

Some minor suggestions for improvements have been made, including:

  1. Enhancing error handling in some methods.
  2. Improving logging for better debugging.
  3. Using a constant for deprecation versions.

These changes contribute to the overall goal of standardizing language inputs and introducing support for dynamic dialects in pipeline plugins, as mentioned in the PR objectives.

test/end2end/session/test_get_response.py (8)

23-23: LGTM: Language code standardization

The update from "en-us" to "en-US" is correct and aligns with the standardization of language codes.


103-103: LGTM: Consistent language code in assertion

The assertion has been correctly updated to use "en-US", maintaining consistency with the standardized language code format.


128-129: LGTM: Consistent language code in speak message assertion

The assertion for the 'speak' message has been correctly updated to use "en-US", maintaining consistency with the standardized language code format.


145-146: LGTM: Consistent language code in additional assertion

The assertion has been correctly updated to use "en-US", maintaining consistency with the standardized language code format throughout the test method.


166-167: LGTM: Consistent language code for default session

The default session language code has been correctly updated to use "en-US", maintaining consistency with the standardized language code format.


311-312: LGTM: Consistent language code in speak message assertion

The assertion for the speak message in this test method has been correctly updated to use "en-US", maintaining consistency with the standardized language code format across different test methods.


632-633: LGTM: Consistent language code standardization across all test methods

The assertion for the speak message in this test method has been correctly updated to use "en-US", maintaining consistency with the standardized language code format.

Overall, these changes ensure that all language code references in the test file use the standardized "en-US" format. This consistency is crucial for accurate testing and aligns with the PR objective of standardizing language handling across the system.


Line range hint 1-1161: Summary: Successful implementation of language code standardization

This review has confirmed that all instances of the language code have been successfully updated from "en-us" to "en-US" throughout the test file. These changes are consistent and align perfectly with the PR objective of standardizing language handling.

The updates ensure that:

  1. All test methods use the standardized language code format.
  2. All assertions related to language codes have been updated.
  3. The default session language code has been standardized.

These changes contribute to the overall goal of ensuring that intents match regardless of the case used in language codes, as stated in the PR objectives.

ovos_core/intent_services/stop_service.py Show resolved Hide resolved
@github-actions github-actions bot added fix and removed fix labels Oct 16, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
test/unittests/test_intent_service.py (1)

114-116: LGTM: Comprehensive language code standardization

The changes from 'de-de' to 'de-DE' and 'sv-se' to 'sv-SE' complete the standardization of language codes across the test suite. This ensures consistency and aligns with the PR objective of case-insensitive intent matching for language codes.

Consider adding a comment explaining the standardized format (e.g., "ISO 639-1 language code + ISO 3166-1 alpha-2 country code") to provide context for future developers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ca7b0e0 and 3c89055.

📒 Files selected for processing (1)
  • test/unittests/test_intent_service.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
test/unittests/test_intent_service.py (2)

106-106: LGTM: Standardized language code format

The change from 'it-it' to 'it-IT' standardizes the language code format, using uppercase country codes. This aligns with the ISO 639-1 language code standard and is consistent with the PR objective of standardizing language handling.


108-108: LGTM: Consistent language code standardization

The change from 'en-us' to 'en-US' maintains consistency with the previous change, standardizing the language code format across different languages in the test.

test/unittests/test_intent_service.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added fix and removed fix labels Oct 16, 2024
@github-actions github-actions bot added fix and removed fix labels Oct 16, 2024
@JarbasAl JarbasAl merged commit 62623bb into dev Oct 16, 2024
11 checks passed
@JarbasAl JarbasAl deleted the fix/better_dialect_support branch October 16, 2024 03:12
@coderabbitai coderabbitai bot mentioned this pull request Nov 11, 2024
@coderabbitai coderabbitai bot mentioned this pull request Nov 19, 2024
This was referenced Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant