-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Publish the response to process as stream_status.response in the transformation's context #191
base: main
Are you sure you want to change the base?
Conversation
* 'main' of https://github.com/rpopov/airbyte-python-cdk: fix: fix sorting & __init__ imports (airbytehq#189)
…e transformation's context
📝 WalkthroughWalkthroughThe changes introduce a new Changes
Sequence DiagramsequenceDiagram
participant RecordSelector
participant DpathExtractor
participant ResponseData
RecordSelector->>DpathExtractor: Extract response root
DpathExtractor-->>RecordSelector: Return extracted data
RecordSelector->>RecordSelector: Update stream state
RecordSelector->>RecordSelector: Filter and transform records
RecordSelector->>RecordSelector: Clean up stream state
Hey there! The updates to the record selection mechanism sound quite intriguing. The addition of the Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/extractors/record_selector.py (3)
10-10
: Import usage in performance-critical paths?
DpathExtractor can be costly for large responses. Would you consider a lazy approach or a more direct path extraction if performance becomes an issue? wdyt?
25-26
: Clarify the constant's naming.
Would a name likeORIGINAL_RESPONSE_KEY
convey its purpose more explicitly for future readers? wdyt?
57-57
: Question on reusing existing config.
Is there a reasonDpathExtractor
is instantiated with empty configs instead of reusingself.config
orparameters
? Could that unify configuration handling nicely? wdyt?unit_tests/sources/declarative/extractors/test_record_selector.py (1)
77-89
: Edge cases for minimal responses.
This new test indexes the second element ofresponse.data
. Shall we add a test for scenarios where there's only one or zero items to avoid index errors? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/extractors/record_selector.py
(4 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(1 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/extractors/record_selector.py (1)
93-101
: Check for collisions in stream_state
.
If stream_state
already has "response", is there a risk of overwriting or losing data? Would a cautionary check or a unique key be helpful? wdyt?
But the tests failed due to testing the call stack frames too. As the latter breaks any encapsulation, I removed the call steck check from the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/extractors/record_selector.py (2)
57-57
: Consider making the response_root_extractor configurable?Currently initializing with empty config and parameters. Would it make sense to allow customization through the constructor, similar to other extractors in the class? This could provide more flexibility for different API response structures, wdyt?
@dataclass class RecordSelector(HttpSelector): extractor: RecordExtractor config: Config parameters: InitVar[Mapping[str, Any]] schema_normalization: TypeTransformer name: str + response_root_field_path: List[str] = field(default_factory=list) _name: Union[InterpolatedString, str] = field(init=False, repr=False, default="") record_filter: Optional[RecordFilter] = None transformations: List[RecordTransformation] = field(default_factory=lambda: []) def __post_init__(self, parameters: Mapping[str, Any]) -> None: self._parameters = parameters self._name = ( InterpolatedString(self._name, parameters=parameters) if isinstance(self._name, str) else self._name ) - self.response_root_extractor = DpathExtractor(field_path=[], config={}, parameters={}) + self.response_root_extractor = DpathExtractor( + field_path=self.response_root_field_path, + config=self.config, + parameters=parameters + )
96-99
: Consider memory implications of storing response in state?The implementation stores the entire response in the stream state. For large responses, this could lead to increased memory usage. Should we consider:
- Storing only essential response data?
- Adding size limits or truncation?
- Making it configurable which parts of the response to store?
What are your thoughts on this?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/extractors/record_selector.py
(4 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/declarative/extractors/test_record_selector.py
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/extractors/record_selector.py (3)
10-10
: LGTM! Clean additions for the new feature.
The new import and constant are well-organized and follow the codebase conventions. The constant name is descriptive and its placement makes sense.
Also applies to: 25-26
101-103
: LGTM! Clean integration with existing logic.
The enhanced stream state is properly passed to the filter_and_transform method, maintaining compatibility with existing functionality.
94-95
: Consider adding error handling for response root extraction?
The response root extraction could fail silently if the response structure is unexpected. Would it be helpful to add some error handling here to provide better debugging information, wdyt?
- response_root_iterator = iter(self.response_root_extractor.extract_records(response))
+ try:
+ response_root_iterator = iter(self.response_root_extractor.extract_records(response))
+ except Exception as e:
+ logger.warning(f"Failed to extract response root: {e}")
+ response_root_iterator = iter([None])
Feat: Publish the response to process as stream_status.response in the transformation's context
See:
Summary by CodeRabbit
New Features
Bug Fixes
Tests