-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move utterance normalizer to own repo #48
Conversation
WalkthroughThe recent updates to the codebase involve a significant shift in functionality, primarily through the migration of various classes and methods from Changes
Sequence Diagram(s)sequenceDiagram
participant A as Application
participant B as Normalization Module
participant C as Tokenization Module
participant D as Numeric Parsing Module
A->>B: Request text normalization
B->>A: Return normalized text
A->>C: Request tokenization
C->>A: Return tokenized output
A->>D: Request numeric parsing
D->>A: Return parsed numbers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (13)
- ovos_classifiers/heuristics/machine_comprehension.py (1 hunks)
- ovos_classifiers/heuristics/normalize.py (1 hunks)
- ovos_classifiers/heuristics/numeric.py (1 hunks)
- ovos_classifiers/heuristics/postag.py (1 hunks)
- ovos_classifiers/heuristics/summarization.py (1 hunks)
- ovos_classifiers/heuristics/time.py (1 hunks)
- ovos_classifiers/heuristics/tokenize.py (1 hunks)
- ovos_classifiers/opm/heuristics.py (1 hunks)
- ovos_classifiers/skovos/features/init.py (1 hunks)
- ovos_classifiers/tasks/tagger.py (1 hunks)
- requirements.txt (1 hunks)
- setup.py (1 hunks)
- test/unittests/test_numeric.py (1 hunks)
Files skipped from review due to trivial changes (4)
- ovos_classifiers/heuristics/machine_comprehension.py
- ovos_classifiers/opm/heuristics.py
- ovos_classifiers/tasks/tagger.py
- test/unittests/test_numeric.py
Additional context used
Ruff
ovos_classifiers/heuristics/numeric.py
1-1:
from ovos_utterance_normalizer.numeric import *
used; unable to detect undefined names(F403)
ovos_classifiers/heuristics/normalize.py
1-1:
from ovos_utterance_normalizer.normalizer import *
used; unable to detect undefined names(F403)
ovos_classifiers/heuristics/tokenize.py
1-1:
from ovos_utterance_normalizer.tokenization import *
used; unable to detect undefined names(F403)
ovos_classifiers/heuristics/time.py
6-6:
ovos_utterance_normalizer.tokenization.ReplaceableNumber
imported but unusedRemove unused import
(F401)
Additional comments not posted (6)
requirements.txt (1)
11-12
: Dependency addition approved.The addition of
ovos-utterance-normalizer
aligns with the PR's objective of moving the utterance normalizer to its own repository.ovos_classifiers/heuristics/summarization.py (1)
2-2
: Import source change approved.The change in the import source for
word_tokenize
aligns with the PR's objective of moving the utterance normalizer to its own repository.setup.py (1)
Line range hint
47-47
:
Entry point removal approved.The removal of the entry point for
ovos-utterance-normalizer
aligns with the PR's objective of moving the utterance normalizer to its own repository.ovos_classifiers/heuristics/postag.py (1)
6-6
: Update import statement to reflect new module structure.The import statement has been updated to import
word_tokenize
fromovos_utterance_normalizer.tokenization
instead ofovos_classifiers.heuristics.tokenize
. Ensure that the new module provides the same functionality and that all dependencies are correctly updated.ovos_classifiers/heuristics/time.py (1)
5-6
: Update import statements to reflect new module structure.The import statements have been updated to import
EnglishNumberParser
,GermanNumberParser
,ReplaceableNumber
,ReplaceableTimedelta
,ReplaceableTime
,ReplaceableDate
,Token
, andword_tokenize
fromovos_utterance_normalizer
instead ofovos_classifiers.heuristics
. Ensure that the new module provides the same functionality and that all dependencies are correctly updated.Tools
Ruff
6-6:
ovos_utterance_normalizer.tokenization.ReplaceableNumber
imported but unusedRemove unused import
(F401)
ovos_classifiers/skovos/features/__init__.py (1)
19-19
: Update import statement to reflect new module structure.The import statement has been updated to import
word_tokenize
fromovos_utterance_normalizer.tokenization
instead ofovos_classifiers.heuristics.tokenize
. Ensure that the new module provides the same functionality and that all dependencies are correctly updated.
from ovos_classifiers.heuristics.numeric import EnglishNumberParser, GermanNumberParser | ||
from ovos_classifiers.heuristics.tokenize import ReplaceableNumber, ReplaceableTimedelta, \ | ||
from ovos_utterance_normalizer.numeric import EnglishNumberParser, GermanNumberParser | ||
from ovos_utterance_normalizer.tokenization import ReplaceableNumber, ReplaceableTimedelta, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
The import ReplaceableNumber
is not used in the file and should be removed to clean up the code.
- from ovos_utterance_normalizer.tokenization import ReplaceableNumber, ReplaceableTimedelta, \
+ from ovos_utterance_normalizer.tokenization import ReplaceableTimedelta, \
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from ovos_utterance_normalizer.tokenization import ReplaceableNumber, ReplaceableTimedelta, \ | |
from ovos_utterance_normalizer.tokenization import ReplaceableTimedelta, \ |
Tools
Ruff
6-6:
ovos_utterance_normalizer.tokenization.ReplaceableNumber
imported but unusedRemove unused import
(F401)
string_num_scale_az = {v: k for k, v in string_num_scale_az.items()} | ||
|
||
return multiplies, string_num_ordinal_az, string_num_scale_az | ||
from ovos_utterance_normalizer.numeric import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid wildcard imports.
Using import *
can lead to namespace pollution and makes it harder to track which names are in the namespace. It is better to import specific entities.
- from ovos_utterance_normalizer.numeric import *
+ from ovos_utterance_normalizer.numeric import NumericParser, parse_number, parse_ordinal, parse_fraction
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from ovos_utterance_normalizer.numeric import * | |
from ovos_utterance_normalizer.numeric import NumericParser, parse_number, parse_ordinal, parse_fraction |
Tools
Ruff
1-1:
from ovos_utterance_normalizer.numeric import *
used; unable to detect undefined names(F403)
# return them pretty regularly | ||
utterance = re.sub(r'\b(\w*)-(\w*)\b', r'\1 \2', utterance) | ||
return super().remove_symbols(utterance) | ||
from ovos_utterance_normalizer.normalizer import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid wildcard imports.
Using import *
can lead to namespace pollution and makes it harder to track which names are in the namespace. It is better to import specific entities.
- from ovos_utterance_normalizer.normalizer import *
+ from ovos_utterance_normalizer.normalizer import Normalizer, CatalanNormalizer, CzechNormalizer, PortugueseNormalizer, RussianNormalizer, UkrainianNormalizer, EnglishNormalizer, AzerbaijaniNormalizer, GermanNormalizer
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from ovos_utterance_normalizer.normalizer import * | |
from ovos_utterance_normalizer.normalizer import Normalizer, CatalanNormalizer, CzechNormalizer, PortugueseNormalizer, RussianNormalizer, UkrainianNormalizer, EnglishNormalizer, AzerbaijaniNormalizer, GermanNormalizer |
Tools
Ruff
1-1:
from ovos_utterance_normalizer.normalizer import *
used; unable to detect undefined names(F403)
""" | ||
from nltk.tokenize.sonority_sequencing import SyllableTokenizer | ||
return SyllableTokenizer().tokenize(utterance) | ||
from ovos_utterance_normalizer.tokenization import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid wildcard imports.
Using import *
can lead to namespace pollution and makes it harder to track which names are in the namespace. It is better to import specific entities.
- from ovos_utterance_normalizer.tokenization import *
+ from ovos_utterance_normalizer.tokenization import Token, ReplaceableEntity, ReplaceableNumber, ReplaceableDate, ReplaceableTime, ReplaceableTimedelta, partition_list, sentence_tokenize, word_tokenize, word_tokenize_pt, word_tokenize_ca, subword_tokenize, syllable_tokenize
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from ovos_utterance_normalizer.tokenization import * | |
from ovos_utterance_normalizer.tokenization import Token, ReplaceableEntity, ReplaceableNumber, ReplaceableDate, ReplaceableTime, ReplaceableTimedelta, partition_list, sentence_tokenize, word_tokenize, word_tokenize_pt, word_tokenize_ca, subword_tokenize, syllable_tokenize |
Tools
Ruff
1-1:
from ovos_utterance_normalizer.tokenization import *
used; unable to detect undefined names(F403)
companion to OpenVoiceOS/ovos-utterance-normalizer#1
Summary by CodeRabbit
New Features
ovos_utterance_normalizer
package for tokenization and normalization functionalities.ovos-utterance-normalizer
.Bug Fixes
Refactor
Chores