-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(HttpComponentsResolver): added stream slices to HttpComponentsResolver #175
base: main
Are you sure you want to change the base?
feat(HttpComponentsResolver): added stream slices to HttpComponentsResolver #175
Conversation
/autofix
|
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces enhancements to the Airbyte CDK's declarative components, focusing on improving schema definitions and component resolution mechanisms. The changes primarily involve updating the Changes
Possibly Related PRs
Suggested Reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)
Line range hint
249-264
: Consider usingmatch
parameter in exception assertionIn
test_duplicated_dynamic_streams_read_with_http_components_resolver
, you're asserting the exact error message string. To make the test more resilient to changes in the error message wording, would it be beneficial to use thematch
parameter ofpytest.raises
to check for specific content? This way, minor wording changes won't cause the test to fail unexpectedly. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
(1 hunks)unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
(1 hunks)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1)
91-112
: Stream slicing integration looks great!
The changes introducing stream slicing by iterating over stream_slices
and processing records accordingly enhance the resolver's ability to handle multiple data slices effectively. This is a valuable improvement to the component's functionality. Well done!
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)
224-224
: Including stream_slices
in the mock retriever
Adding mock_retriever.stream_slices.return_value = [{}]
ensures that the tests accurately simulate the presence of stream slices. This change improves the robustness of the test_http_components_resolver
function. Nice work!
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2356-2356
: Clarifying stream_slicer
assignment logic
The updated assignment stream_slicer=stream_slicer if stream_slicer else combined_slicers
in create_http_components_resolver
enhances flexibility by prioritizing the provided stream_slicer
. Just to confirm, are there scenarios where both stream_slicer
and combined_slicers
might be defined, potentially leading to unexpected behavior? Should we add a check or warning to handle such cases explicitly? Wdyt?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
Line range hint 3010-3033
: Enhancing ComponentMappingDefinition
with stream_slice
Expanding the interpolation_context
to include stream_slice
in both the field_path
and value
properties allows for more dynamic component mapping based on stream slices. This improvement increases the flexibility of the schema definitions. Great job on this enhancement!
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
Outdated
Show resolved
Hide resolved
These changes make sense to me, but can we have a test with manifest configuration including the slicer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM.
However, we need to add another test case and resolve some comments.
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
Outdated
Show resolved
Hide resolved
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
Show resolved
Hide resolved
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (4)
348-348
: Consider updating the test name to reflect stream slice testing?The test now includes stream slice verification, but the name
test_http_components_resolver
doesn't indicate this. Maybe we could rename it totest_http_components_resolver_with_empty_stream_slice
to be more specific, wdyt?
361-395
: Consider adding more test cases for robustness?The test covers the happy path well, but what about adding some edge cases? For example:
- Empty stream slices
- Invalid component mappings
- Missing or null parent_id
- Multiple field mappings within the same slice
This would help ensure the resolver handles error cases gracefully, wdyt?
469-523
: How about enhancing the test coverage for error scenarios?The test thoroughly covers the happy path, but we might want to consider adding tests for:
- Parent stream returning empty results
- Child stream API failures
- Invalid parent IDs in the response
Also, the mock responses at line 498 are quite simple. Should we make them more realistic by including actual field values that match the schema defined in the manifest?
200-324
: Should we move the manifest to a separate fixture file?The manifest is quite complex and makes the test file longer than necessary. What do you think about:
- Moving it to a separate fixture file (e.g.,
conftest.py
orfixtures/manifests.py
)?- Using a fixture factory to generate manifests with different configurations?
This would make the tests more maintainable and allow reuse across different test files, wdyt?
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (2)
91-95
: LGTM! Though we could improve type safety here 🤔The stream slicing implementation looks solid! However, what do you think about making the types more explicit instead of using type ignores? We could:
- Add type hints for stream_slice and components_values in kwargs
- Use TypedDict for kwargs to make it more type-safe
Something like this maybe?
- kwargs = {"stream_template_config": stream_template_config} + from typing import TypedDict + class ResolverKwargs(TypedDict, total=False): + stream_template_config: Dict[str, Any] + components_values: Mapping[str, Any] + stream_slice: Mapping[str, Any] + + kwargs: ResolverKwargs = {"stream_template_config": stream_template_config}wdyt? 🤔
91-110
: Consider documenting stream slice behavior 📚The stream slicing implementation looks great! Since this is an experimental class (as noted by the decorator), would it be helpful to:
- Add docstring examples showing how stream_slice affects component resolution?
- Document any assumptions about stream_slicer presence in retriever?
- Add a section in the class docstring about stream slicing behavior?
This would help users understand how to effectively use this feature. Happy to help draft the documentation if you'd like! 😊
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
(1 hunks)unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
(4 hunks)
🔇 Additional comments (2)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)
Line range hint 200-523
: Overall implementation looks great! 🎉
The implementation is solid and well-tested. The suggestions above are just for potential improvements, but the current code is ready to go.
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1)
97-110
: Looks good! Let's verify the component resolution 🔍
The component resolution logic looks solid! The path evaluation correctly uses the stream_slice context.
Let's verify how this is being used in the codebase:
✅ Verification successful
Looks good! The stream_slice handling is working as expected 🎯
The implementation correctly handles stream slices in the component resolution process:
- The code properly uses stream_slice context in the resolve_components method
- Tests verify both simple and complex scenarios (including parent-child relationships)
- The implementation maintains immutability by using deepcopy
- Stream slice values are correctly interpolated in component mappings
The test cases demonstrate that it works correctly with:
- Basic stream slices (
test_http_components_resolver
) - Parent-child relationships (
test_dynamic_streams_with_http_components_resolver_retriever_with_parent_stream
) - Multiple slices with different values (
test_http_components_resolver_with_stream_slices
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for any custom implementations of stream slicing in components
ast-grep --pattern 'class $name implements HttpComponentsResolver {
$$$
stream_slices($$$) {
$$$
}
$$$
}'
# Check for any test cases using stream slices with HttpComponentsResolver
rg -l "HttpComponentsResolver.*stream_slice" --type python
Length of output: 211
Script:
#!/bin/bash
# Let's try a different approach to find usage patterns
# Check for any test files related to HttpComponentsResolver
fd "test.*\.py" | xargs rg "HttpComponentsResolver"
# Look for stream_slice usage in Python files
fd "\.py$" | xargs rg "stream_slice"
# Find implementations or usages of HttpComponentsResolver
fd "\.py$" | xargs rg "class.*HttpComponentsResolver"
# Look for component resolution with stream context
fd "\.py$" | xargs rg "resolved_component.*stream"
Length of output: 89632
Script:
#!/bin/bash
# Let's check the implementation and tests of HttpComponentsResolver
fd "http_components_resolver\.py$" | xargs cat
# Look for tests specifically related to stream_slice usage
fd "test_http_components_resolver\.py$" | xargs cat
# Check for any direct references to stream_slice in the resolver
fd "\.py$" | xargs rg "stream_slice.*HttpComponentsResolver"
Length of output: 46462
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (4)
200-325
: Consider extracting nested configurations into separate constants for better readability, wdyt?The manifest is quite complex with deeply nested configurations. We could improve readability by extracting some parts into separate constants:
- Parent stream configuration (lines 269-301)
- Stream template configuration (lines 207-241)
- Components resolver configuration (lines 242-322)
Example approach:
_PARENT_STREAM_CONFIG = { "type": "DeclarativeStream", "name": "parent", # ... rest of parent stream config } _STREAM_TEMPLATE_CONFIG = { "type": "DeclarativeStream", "name": "", # ... rest of stream template config } _COMPONENTS_RESOLVER_CONFIG = { "type": "HttpComponentsResolver", # ... rest of resolver config } _MANIFEST_WITH_HTTP_COMPONENT_RESOLVER_WITH_RETRIEVER_WITH_PARENT_STREAM = { "version": "6.7.0", "type": "DeclarativeSource", "check": {"type": "CheckStream", "stream_names": ["Rates"]}, "dynamic_streams": [{ "type": "DynamicDeclarativeStream", "stream_template": _STREAM_TEMPLATE_CONFIG, "components_resolver": _COMPONENTS_RESOLVER_CONFIG }] }
351-363
: How about adding explicit assertions for stream slices behavior?The test verifies the result but doesn't explicitly assert the stream slices interaction. Consider adding assertions to verify:
- That stream_slices was called
- The number of times read_records was called with the slice
def test_http_components_resolver(...): # ... existing setup ... result = list(resolver.resolve_components(stream_template_config=stream_template_config)) # Add these assertions mock_retriever.stream_slices.assert_called_once() assert mock_retriever.read_records.call_count == 1 mock_retriever.read_records.assert_called_with(sync_mode=None, stream_slice={}, stream_state={}) assert result == expected_result
365-382
: Would you consider adding more test cases to cover edge scenarios?The current test case is good but we could make it more robust by adding cases for:
- Empty stream slices
- Missing or null component values
- Invalid path templates
@pytest.mark.parametrize( "components_mapping, retriever_data, stream_template_config, expected_result", [ # Existing case ( [ComponentMappingDefinition(...)], [{"id": "1", "field1": "data1"}, {"id": "2", "field1": "data2"}], {"path": None}, [{"path": "1/1"}, {"path": "1/2"}, {"path": "2/1"}, {"path": "2/2"}], ), # Empty stream slices ( [ComponentMappingDefinition(...)], [{"id": "1"}], {"path": None}, [], ), # Missing component values ( [ComponentMappingDefinition(...)], [{}], {"path": None}, [{"path": "1/"}], # or expect an error ), ], )
475-531
: How about improving test maintainability by extracting test data?The test setup is quite complex with inline response data. Consider extracting these into constants or fixture functions for better maintainability and reusability, wdyt?
# At module level _PARENT_RESPONSE_DATA = [{"id": 1}, {"id": 2}] _ITEMS_RESPONSE_DATA = [ {"id": 1, "name": "item_1"}, {"id": 2, "name": "item_2"}, ] _DYNAMIC_STREAM_RESPONSE_DATA = [{"ABC": 1, "AED": 2}] @pytest.fixture def mock_http_responses(http_mocker): http_mocker.get( HttpRequest(url="https://api.test.com/parents"), HttpResponse(body=json.dumps(_PARENT_RESPONSE_DATA)), ) for parent_id in [1, 2]: http_mocker.get( HttpRequest(url=f"https://api.test.com/parent/{parent_id}/items"), HttpResponse(body=json.dumps(_ITEMS_RESPONSE_DATA)), ) # ... rest of the mocks return http_mocker def test_dynamic_streams_with_http_components_resolver_retriever_with_parent_stream(mock_http_responses): # Test can now focus on assertions rather than setup source = ConcurrentDeclarativeSource(...) # ... rest of the test
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
(1 hunks)unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
Line range hint
3052-3075
: The examples look good, but what do you think about adding more comprehensive ones? wdyt?The new interpolation contexts for stream slices are well documented, but we could make it even clearer with additional examples showing:
- Nested stream slice access
- Combined usage of stream_slice with stream_template_config
- Multiple extra_fields usage
Consider adding these examples to the
examples
section:- ["data", "{{ stream_slice['parent']['id'] }}"] - ["data", "{{ stream_slice['id'] }}_{{ stream_template_config['suffix'] }}"] - ["data", "{{ stream_slice['extra_fields']['field1'] }}_{{ stream_slice['extra_fields']['field2'] }}"]
Line range hint
3076-3094
: The experimental component looks good, but should we add more documentation about its limitations? wdyt?Since this is an experimental component, it would be helpful to:
- Document any known limitations or edge cases
- Provide guidance on when to use HttpComponentsResolver vs ConfigComponentsResolver
- Add examples of common usage patterns
Consider adding a more detailed description like:
description: | (This component is experimental. Use at your own risk.) Component resolve and populates stream templates with components fetched via an HTTP retriever. Known limitations: - <list key limitations> Best used when: - Dynamically fetching component configurations from an API - Need to resolve components based on runtime data Example usage: ```yaml type: HttpComponentsResolver retriever: type: SimpleRetriever # ... retriever config components_mapping: - type: ComponentMappingDefinition # ... mapping config--- Line range hint `3095-3113`: **The ConfigComponentsResolver looks solid, but should we add more validation rules? wdyt?** The component definition could be enhanced with: 1. Validation rules for the stream_config structure 2. Constraints on the components_mapping to ensure compatibility with config-based resolution Consider adding these validations: ```yaml stream_config: "$ref": "#/definitions/StreamConfig" additionalProperties: false # Prevent unknown properties minItems: 1 # Ensure at least one config pointer is provided components_mapping: type: array items: "$ref": "#/definitions/ComponentMappingDefinition" minItems: 1 # Ensure at least one mapping is provided uniqueItems: true # Prevent duplicate mappings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Summary by CodeRabbit
New Features
HttpComponentsResolver
andConfigComponentsResolver
for enhanced component resolution.stream_slice
andstream_template_config
) toComponentMappingDefinition
for improved mapping flexibility.Bug Fixes
resolve_components
method ofHttpComponentsResolver
.Tests
HttpComponentsResolver
to cover scenarios involving stream slices and parent-child relationships.