Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(HttpComponentsResolver): added stream slices to HttpComponentsResolver #175

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

darynaishchenko
Copy link
Contributor

@darynaishchenko darynaishchenko commented Dec 16, 2024

Summary by CodeRabbit

  • New Features

    • Introduced HttpComponentsResolver and ConfigComponentsResolver for enhanced component resolution.
    • Added properties (stream_slice and stream_template_config) to ComponentMappingDefinition for improved mapping flexibility.
  • Bug Fixes

    • Improved handling of stream slices in the resolve_components method of HttpComponentsResolver.
  • Tests

    • Enhanced tests for HttpComponentsResolver to cover scenarios involving stream slices and parent-child relationships.

@darynaishchenko darynaishchenko self-assigned this Dec 16, 2024
@github-actions github-actions bot added the enhancement New feature or request label Dec 16, 2024
@darynaishchenko
Copy link
Contributor Author

darynaishchenko commented Dec 16, 2024

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

coderabbitai bot commented Dec 16, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces enhancements to the Airbyte CDK's declarative components, focusing on improving schema definitions and component resolution mechanisms. The changes primarily involve updating the declarative_component_schema.yaml file to add new components like HttpComponentsResolver and ConfigComponentsResolver, and extending the ComponentMappingDefinition with stream_slice properties. The modifications aim to provide more flexible and dynamic component mapping and resolution capabilities.

Changes

File Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml - Added HttpComponentsResolver component
- Added ConfigComponentsResolver component
- Extended ComponentMappingDefinition with stream_slice properties in field_path and value
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Modified create_http_components_resolver method to handle stream_slicer more flexibly
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py Updated resolve_components method to support stream slice processing
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py Added test for stream slice handling and duplicate dynamic streams

Possibly Related PRs

Suggested Reviewers

  • maxi297
  • aldogonzalez8

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)

Line range hint 249-264: Consider using match parameter in exception assertion

In test_duplicated_dynamic_streams_read_with_http_components_resolver, you're asserting the exact error message string. To make the test more resilient to changes in the error message wording, would it be beneficial to use the match parameter of pytest.raises to check for specific content? This way, minor wording changes won't cause the test to fail unexpectedly. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 216cd43 and d9291e0.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1)

91-112: Stream slicing integration looks great!

The changes introducing stream slicing by iterating over stream_slices and processing records accordingly enhance the resolver's ability to handle multiple data slices effectively. This is a valuable improvement to the component's functionality. Well done!

unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)

224-224: Including stream_slices in the mock retriever

Adding mock_retriever.stream_slices.return_value = [{}] ensures that the tests accurately simulate the presence of stream slices. This change improves the robustness of the test_http_components_resolver function. Nice work!

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2356-2356: Clarifying stream_slicer assignment logic

The updated assignment stream_slicer=stream_slicer if stream_slicer else combined_slicers in create_http_components_resolver enhances flexibility by prioritizing the provided stream_slicer. Just to confirm, are there scenarios where both stream_slicer and combined_slicers might be defined, potentially leading to unexpected behavior? Should we add a check or warning to handle such cases explicitly? Wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

Line range hint 3010-3033: Enhancing ComponentMappingDefinition with stream_slice

Expanding the interpolation_context to include stream_slice in both the field_path and value properties allows for more dynamic component mapping based on stream slices. This improvement increases the flexibility of the schema definitions. Great job on this enhancement!

@aldogonzalez8
Copy link
Contributor

These changes make sense to me, but can we have a test with manifest configuration including the slicer?

Copy link
Contributor

@lazebnyi lazebnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM.

However, we need to add another test case and resolve some comments.

@darynaishchenko
Copy link
Contributor Author

darynaishchenko commented Dec 18, 2024

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (4)

348-348: Consider updating the test name to reflect stream slice testing?

The test now includes stream slice verification, but the name test_http_components_resolver doesn't indicate this. Maybe we could rename it to test_http_components_resolver_with_empty_stream_slice to be more specific, wdyt?


361-395: Consider adding more test cases for robustness?

The test covers the happy path well, but what about adding some edge cases? For example:

  • Empty stream slices
  • Invalid component mappings
  • Missing or null parent_id
  • Multiple field mappings within the same slice

This would help ensure the resolver handles error cases gracefully, wdyt?


469-523: How about enhancing the test coverage for error scenarios?

The test thoroughly covers the happy path, but we might want to consider adding tests for:

  1. Parent stream returning empty results
  2. Child stream API failures
  3. Invalid parent IDs in the response

Also, the mock responses at line 498 are quite simple. Should we make them more realistic by including actual field values that match the schema defined in the manifest?


200-324: Should we move the manifest to a separate fixture file?

The manifest is quite complex and makes the test file longer than necessary. What do you think about:

  1. Moving it to a separate fixture file (e.g., conftest.py or fixtures/manifests.py)?
  2. Using a fixture factory to generate manifests with different configurations?

This would make the tests more maintainable and allow reuse across different test files, wdyt?

airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (2)

91-95: LGTM! Though we could improve type safety here 🤔

The stream slicing implementation looks solid! However, what do you think about making the types more explicit instead of using type ignores? We could:

  1. Add type hints for stream_slice and components_values in kwargs
  2. Use TypedDict for kwargs to make it more type-safe

Something like this maybe?

-        kwargs = {"stream_template_config": stream_template_config}
+        from typing import TypedDict
+        class ResolverKwargs(TypedDict, total=False):
+            stream_template_config: Dict[str, Any]
+            components_values: Mapping[str, Any]
+            stream_slice: Mapping[str, Any]
+        
+        kwargs: ResolverKwargs = {"stream_template_config": stream_template_config}

wdyt? 🤔


91-110: Consider documenting stream slice behavior 📚

The stream slicing implementation looks great! Since this is an experimental class (as noted by the decorator), would it be helpful to:

  1. Add docstring examples showing how stream_slice affects component resolution?
  2. Document any assumptions about stream_slicer presence in retriever?
  3. Add a section in the class docstring about stream slicing behavior?

This would help users understand how to effectively use this feature. Happy to help draft the documentation if you'd like! 😊

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9291e0 and fb21cd4.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (4 hunks)
🔇 Additional comments (2)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1)

Line range hint 200-523: Overall implementation looks great! 🎉

The implementation is solid and well-tested. The suggestions above are just for potential improvements, but the current code is ready to go.

airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1)

97-110: Looks good! Let's verify the component resolution 🔍

The component resolution logic looks solid! The path evaluation correctly uses the stream_slice context.

Let's verify how this is being used in the codebase:

✅ Verification successful

Looks good! The stream_slice handling is working as expected 🎯

The implementation correctly handles stream slices in the component resolution process:

  • The code properly uses stream_slice context in the resolve_components method
  • Tests verify both simple and complex scenarios (including parent-child relationships)
  • The implementation maintains immutability by using deepcopy
  • Stream slice values are correctly interpolated in component mappings

The test cases demonstrate that it works correctly with:

  • Basic stream slices (test_http_components_resolver)
  • Parent-child relationships (test_dynamic_streams_with_http_components_resolver_retriever_with_parent_stream)
  • Multiple slices with different values (test_http_components_resolver_with_stream_slices)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any custom implementations of stream slicing in components
ast-grep --pattern 'class $name implements HttpComponentsResolver {
  $$$
  stream_slices($$$) {
    $$$
  }
  $$$
}'

# Check for any test cases using stream slices with HttpComponentsResolver
rg -l "HttpComponentsResolver.*stream_slice" --type python

Length of output: 211


Script:

#!/bin/bash
# Let's try a different approach to find usage patterns

# Check for any test files related to HttpComponentsResolver
fd "test.*\.py" | xargs rg "HttpComponentsResolver"

# Look for stream_slice usage in Python files
fd "\.py$" | xargs rg "stream_slice"

# Find implementations or usages of HttpComponentsResolver
fd "\.py$" | xargs rg "class.*HttpComponentsResolver"

# Look for component resolution with stream context
fd "\.py$" | xargs rg "resolved_component.*stream"

Length of output: 89632


Script:

#!/bin/bash
# Let's check the implementation and tests of HttpComponentsResolver
fd "http_components_resolver\.py$" | xargs cat

# Look for tests specifically related to stream_slice usage
fd "test_http_components_resolver\.py$" | xargs cat

# Check for any direct references to stream_slice in the resolver
fd "\.py$" | xargs rg "stream_slice.*HttpComponentsResolver"

Length of output: 46462

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (4)

200-325: Consider extracting nested configurations into separate constants for better readability, wdyt?

The manifest is quite complex with deeply nested configurations. We could improve readability by extracting some parts into separate constants:

  • Parent stream configuration (lines 269-301)
  • Stream template configuration (lines 207-241)
  • Components resolver configuration (lines 242-322)

Example approach:

_PARENT_STREAM_CONFIG = {
    "type": "DeclarativeStream",
    "name": "parent",
    # ... rest of parent stream config
}

_STREAM_TEMPLATE_CONFIG = {
    "type": "DeclarativeStream",
    "name": "",
    # ... rest of stream template config
}

_COMPONENTS_RESOLVER_CONFIG = {
    "type": "HttpComponentsResolver",
    # ... rest of resolver config
}

_MANIFEST_WITH_HTTP_COMPONENT_RESOLVER_WITH_RETRIEVER_WITH_PARENT_STREAM = {
    "version": "6.7.0",
    "type": "DeclarativeSource",
    "check": {"type": "CheckStream", "stream_names": ["Rates"]},
    "dynamic_streams": [{
        "type": "DynamicDeclarativeStream",
        "stream_template": _STREAM_TEMPLATE_CONFIG,
        "components_resolver": _COMPONENTS_RESOLVER_CONFIG
    }]
}

351-363: How about adding explicit assertions for stream slices behavior?

The test verifies the result but doesn't explicitly assert the stream slices interaction. Consider adding assertions to verify:

  1. That stream_slices was called
  2. The number of times read_records was called with the slice
def test_http_components_resolver(...):
    # ... existing setup ...
    
    result = list(resolver.resolve_components(stream_template_config=stream_template_config))
    
    # Add these assertions
    mock_retriever.stream_slices.assert_called_once()
    assert mock_retriever.read_records.call_count == 1
    mock_retriever.read_records.assert_called_with(sync_mode=None, stream_slice={}, stream_state={})
    assert result == expected_result

365-382: Would you consider adding more test cases to cover edge scenarios?

The current test case is good but we could make it more robust by adding cases for:

  1. Empty stream slices
  2. Missing or null component values
  3. Invalid path templates
@pytest.mark.parametrize(
    "components_mapping, retriever_data, stream_template_config, expected_result",
    [
        # Existing case
        (
            [ComponentMappingDefinition(...)],
            [{"id": "1", "field1": "data1"}, {"id": "2", "field1": "data2"}],
            {"path": None},
            [{"path": "1/1"}, {"path": "1/2"}, {"path": "2/1"}, {"path": "2/2"}],
        ),
        # Empty stream slices
        (
            [ComponentMappingDefinition(...)],
            [{"id": "1"}],
            {"path": None},
            [],
        ),
        # Missing component values
        (
            [ComponentMappingDefinition(...)],
            [{}],
            {"path": None},
            [{"path": "1/"}],  # or expect an error
        ),
    ],
)

475-531: How about improving test maintainability by extracting test data?

The test setup is quite complex with inline response data. Consider extracting these into constants or fixture functions for better maintainability and reusability, wdyt?

# At module level
_PARENT_RESPONSE_DATA = [{"id": 1}, {"id": 2}]
_ITEMS_RESPONSE_DATA = [
    {"id": 1, "name": "item_1"},
    {"id": 2, "name": "item_2"},
]
_DYNAMIC_STREAM_RESPONSE_DATA = [{"ABC": 1, "AED": 2}]

@pytest.fixture
def mock_http_responses(http_mocker):
    http_mocker.get(
        HttpRequest(url="https://api.test.com/parents"),
        HttpResponse(body=json.dumps(_PARENT_RESPONSE_DATA)),
    )
    for parent_id in [1, 2]:
        http_mocker.get(
            HttpRequest(url=f"https://api.test.com/parent/{parent_id}/items"),
            HttpResponse(body=json.dumps(_ITEMS_RESPONSE_DATA)),
        )
    # ... rest of the mocks
    return http_mocker

def test_dynamic_streams_with_http_components_resolver_retriever_with_parent_stream(mock_http_responses):
    # Test can now focus on assertions rather than setup
    source = ConcurrentDeclarativeSource(...)
    # ... rest of the test
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb21cd4 and ec9e480.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py

Copy link
Contributor

@aldogonzalez8 aldogonzalez8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

Copy link
Contributor

@lazebnyi lazebnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

Line range hint 3052-3075: The examples look good, but what do you think about adding more comprehensive ones? wdyt?

The new interpolation contexts for stream slices are well documented, but we could make it even clearer with additional examples showing:

  • Nested stream slice access
  • Combined usage of stream_slice with stream_template_config
  • Multiple extra_fields usage

Consider adding these examples to the examples section:

- ["data", "{{ stream_slice['parent']['id'] }}"]
- ["data", "{{ stream_slice['id'] }}_{{ stream_template_config['suffix'] }}"]
- ["data", "{{ stream_slice['extra_fields']['field1'] }}_{{ stream_slice['extra_fields']['field2'] }}"]

Line range hint 3076-3094: The experimental component looks good, but should we add more documentation about its limitations? wdyt?

Since this is an experimental component, it would be helpful to:

  1. Document any known limitations or edge cases
  2. Provide guidance on when to use HttpComponentsResolver vs ConfigComponentsResolver
  3. Add examples of common usage patterns

Consider adding a more detailed description like:

description: |
  (This component is experimental. Use at your own risk.) Component resolve and populates stream templates with components fetched via an HTTP retriever.
  
  Known limitations:
  - <list key limitations>
  
  Best used when:
  - Dynamically fetching component configurations from an API
  - Need to resolve components based on runtime data
  
  Example usage:
  ```yaml
  type: HttpComponentsResolver
  retriever:
    type: SimpleRetriever
    # ... retriever config
  components_mapping:
    - type: ComponentMappingDefinition
      # ... mapping config

---

Line range hint `3095-3113`: **The ConfigComponentsResolver looks solid, but should we add more validation rules? wdyt?**

The component definition could be enhanced with:
1. Validation rules for the stream_config structure
2. Constraints on the components_mapping to ensure compatibility with config-based resolution


Consider adding these validations:
```yaml
stream_config:
  "$ref": "#/definitions/StreamConfig"
  additionalProperties: false  # Prevent unknown properties
  minItems: 1  # Ensure at least one config pointer is provided
components_mapping:
  type: array
  items:
    "$ref": "#/definitions/ComponentMappingDefinition"
  minItems: 1  # Ensure at least one mapping is provided
  uniqueItems: true  # Prevent duplicate mappings
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec9e480 and 8fb4d88.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants