-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low-code cdk): add KeyToSnakeCase transformation #178
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces a new transformation definition called Changes
Sequence DiagramsequenceDiagram
participant Record
participant KeysToSnakeCaseTransformation
participant Normalizer
participant Tokenizer
participant SnakeCaseConverter
Record->>KeysToSnakeCaseTransformation: transform()
KeysToSnakeCaseTransformation->>Normalizer: normalize_key()
Normalizer-->>KeysToSnakeCaseTransformation: normalized key
KeysToSnakeCaseTransformation->>Tokenizer: tokenize_key()
Tokenizer-->>KeysToSnakeCaseTransformation: key tokens
KeysToSnakeCaseTransformation->>SnakeCaseConverter: tokens_to_snake_case()
SnakeCaseConverter-->>KeysToSnakeCaseTransformation: snake_case key
KeysToSnakeCaseTransformation->>Record: update with transformed keys
The sequence diagram illustrates the key transformation process, showing how a record's keys are processed through normalization, tokenization, and conversion to snake_case format. Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
unit_tests/sources/declarative/transformations/test_keys_to_snake_transformation.py (1)
14-45
: Great test coverage! Would you consider adding a few more edge cases?The parametrized tests cover a good range of scenarios. What do you think about adding tests for:
- Nested dictionaries with mixed-case keys
- Empty string keys
- None values in dictionaries
Here's a suggestion for additional test cases:
( {"outer": {"InnerKey": _ANY_VALUE}, "NestedObj": {"AnotherLevel": _ANY_VALUE}}, {"outer": {"inner_key": _ANY_VALUE}, "nested_obj": {"another_level": _ANY_VALUE}}, ), ( {"": _ANY_VALUE, "ValidKey": _ANY_VALUE}, {"": _ANY_VALUE, "valid_key": _ANY_VALUE}, ), ( {"Key": None}, {"key": None}, ),airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py (2)
14-14
: Consider making the regex pattern more efficient?The current pattern works but requires multiple alternations. What do you think about simplifying it to:
TOKEN_PATTERN = re.compile(r'[A-Z][a-z]*|[a-z]+|\d+|[^a-zA-Z\d]+')This achieves the same result with less complexity and better readability, wdyt?
52-57
: The filter_tokens logic could use some clarificationThe current implementation is a bit cryptic. Could we add a docstring explaining:
- Why we're limiting to 3 tokens
- Why we're keeping first and last tokens
- Why we're adding an empty token before numbers
Example:
def filter_tokens(self, tokens: list) -> list: """Filter and modify token list based on specific rules: 1. For 3+ tokens, keep first and last tokens but filter empty ones in between 2. Prefix numeric tokens with underscore by adding empty token Args: tokens: List of string tokens Returns: Filtered and modified token list """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (6)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py
(1 hunks)pyproject.toml
(1 hunks)unit_tests/sources/declarative/transformations/test_keys_to_lower_transformation.py
(1 hunks)unit_tests/sources/declarative/transformations/test_keys_to_snake_transformation.py
(1 hunks)
🔇 Additional comments (6)
unit_tests/sources/declarative/transformations/test_keys_to_lower_transformation.py (1)
15-15
: LGTM! Nice cleanup of the assertion.
The simplified assertion improves readability while maintaining the same test coverage.
airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py (1)
42-43
: LGTM! Good use of unidecode for normalization.
The normalization approach using unidecode is solid and handles international characters well.
pyproject.toml (1)
83-83
: LGTM!
The addition of the Unidecode
dependency with a caret version constraint is appropriate for the new key transformation feature.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
713-716
: LGTM!
The KeyToSnakeCase
class follows the established pattern for transformation classes with appropriate type literal and optional parameters.
1662-1670
: LGTM!
The addition of KeyToSnakeCase
to the transformations
Union type is well-integrated with the existing transformation options.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1842-1854
: LGTM!
The KeyToSnakeCase
schema definition is well-structured and follows the established pattern for transformation components. The title and description clearly communicate its purpose.
airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Serhii
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1843-1843
: Should we change the title to 'Keys to Snake Case' for consistency?Currently, the title is 'Key to Snake Case'. To be consistent with 'Keys to Lower Case', perhaps we should rename it to 'Keys to Snake Case'. Wdyt?
Apply this diff to update the title:
- title: Key to Snake Case + title: Keys to Snake Case
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py
(1 hunks)unit_tests/sources/declarative/transformations/test_keys_to_snake_transformation.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/declarative/transformations/test_keys_to_snake_transformation.py
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/transformations/keys_to_snake_transformation.py (2)
21-30
: Consider returning a new record instead of modifying in place
Currently, the transform
method modifies the input record
in place, which could lead to unexpected behavior if the same dictionary is used elsewhere. Would you consider returning a new transformed record instead? Wdyt?
32-40
: Consider adding support for lists in nested structures
The current implementation handles nested dictionaries but does not process lists of dictionaries. Would you consider adding recursive support for lists to ensure all nested keys are transformed? Wdyt?
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
713-716
: Looks good to me!
The KeysToSnakeCase
class is defined consistently, and the implementation aligns with the existing transformation classes.
1662-1670
: KeysToSnakeCase
correctly added to transformations list
The KeysToSnakeCase
transformation is properly included in the transformations
union type, ensuring it can be utilized in declarative streams.
What
Some sources on the road to manifest only require the transformation of their schema keys' values to snake case.
How
Add a
KeyToSnakeCase
transformation to convert all record keys to snake case.Example:
FirstName
->first_name
lastName
->last_name
123Number
->_123_number
456Another123
->_456_another_123
hello@world
->hello_world
test#case
->test_case
MixedUPCase123
->mixed_upcase_123
lowercaseAnd123
->lowercase_and_123
Café
->cafe
Naïve
->naive
This is a full sentence
->this_is_a_full_sentence
Another full sentence with more words
->another_full_sentence_with_more_words
Summary by CodeRabbit
New Features
KeysToSnakeCase
, for converting record keys to snake_case format.Bug Fixes
Tests
KeysToSnakeCaseTransformation
class, covering various key formats and scenarios.Chores
Unidecode
, to the project requirements.