Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(low code cdk): dynamic streams changes #132

Closed
wants to merge 32 commits into from

Conversation

darynaishchenko
Copy link
Contributor

@darynaishchenko darynaishchenko commented Dec 5, 2024

branch for testing changes from #104 and #88

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced handling for both static and dynamic streams, allowing for more flexible data extraction configurations.
    • Introduced new components for dynamic schema loading and component mapping, improving extensibility.
    • Added support for concurrent processing of streams, optimizing data retrieval performance.
  • Bug Fixes

    • Improved error handling for stream configurations to prevent validation errors.
  • Tests

    • Expanded test coverage for dynamic schema loading and HTTP components resolution, ensuring robustness of new features.

lazebnyi and others added 30 commits November 26, 2024 03:55
…hq/airbyte-python-cdk into lazebnyi/add-components-resolver
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
@darynaishchenko darynaishchenko self-assigned this Dec 5, 2024
Copy link
Contributor

coderabbitai bot commented Dec 5, 2024

📝 Walkthrough

Walkthrough

The pull request introduces significant changes across multiple files related to the ConcurrentDeclarativeSource and other components in the Airbyte CDK framework. Key modifications include enhanced stream handling logic, the introduction of new classes and methods for dynamic schema loading and component resolution, and updates to the schema definitions to support flexible stream configurations. Error handling improvements and new unit tests are also included to ensure robust functionality and validation of the new features.

Changes

File Path Change Summary
airbyte_cdk/sources/declarative/concurrent_declarative_source.py Modified _group_streams method for improved stream handling and error checks.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Updated schema to include dynamic_streams, modified streams, and added new experimental components.
airbyte_cdk/sources/declarative/manifest_declarative_source.py Added _dynamic_stream_configs method for dynamic stream handling; modified streams method.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Introduced new classes (TypesMap, SchemaTypeIdentifier, etc.) and restructured DeclarativeSource.
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py Expanded type mappings for new components in DEFAULT_MODEL_TYPES and CUSTOM_COMPONENTS_MAPPING.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added methods for creating new component definitions and refactored existing methods.
airbyte_cdk/sources/declarative/partition_routers/__init__.py Added PartitionRouter to module exports.
airbyte_cdk/sources/declarative/resolvers/__init__.py Introduced new module for component resolution with mappings and class imports.
airbyte_cdk/sources/declarative/resolvers/components_resolver.py Added classes for component mapping and resolution.
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py Implemented HttpComponentsResolver for HTTP-based component resolution.
airbyte_cdk/sources/declarative/schema/__init__.py Updated module exports to include new schema-related classes.
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py Introduced classes for dynamic schema loading and type mapping.
unit_tests/sources/declarative/resolvers/__init__.py Added copyright notice without functional changes.
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py Created tests for HttpComponentsResolver functionality.
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py Added tests for DynamicSchemaLoader to validate schema generation and error handling.
unit_tests/sources/declarative/test_manifest_declarative_source.py Enhanced tests for ManifestDeclarativeSource with new fixtures and validation checks.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • brianjlai
  • maxi297

Hey there! Do you think these changes will cover all the necessary aspects for the new features? Wdyt?


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (11)
airbyte_cdk/sources/declarative/resolvers/components_resolver.py (3)

27-37: Consider combining similar classes to reduce duplication

Noticing that both ComponentMappingDefinition and ResolvedComponentMappingDefinition have very similar structures, perhaps we could refactor the code to reduce duplication, maybe by having one inherit from the other or by creating a shared base class. Wdyt?


39-41: Consider using an experimental warning instead of deprecation

Using the @deprecated decorator to mark ComponentsResolver as experimental might be a bit misleading since deprecation typically implies that the feature is outdated or will be removed. Would it make sense to use a custom experimental decorator or warning to indicate that this class is experimental? Wdyt?


46-55: Enhance the docstring for resolve_components method

The docstring for the resolve_components method is currently quite brief. Could we provide more detailed information about how subclasses should implement this method and what the expected behavior is? Maybe including an example would be helpful. Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

1624-1649: Consider refactoring duplicate logic in create_dynamic_schema_loader and create_http_components_resolver.

Both methods contain similar code for building stream slicers and retrievers. Extracting this common logic into a helper function could improve maintainability and reduce duplication. Wdyt?


2317-2332: Address the type annotation for field_path in create_components_mapping_definition.

There's a # type: ignore[arg-type] comment due to field_path potentially being str or InterpolatedString. Would it be possible to adjust the type annotations to avoid using type: ignore, perhaps by specifying Union[str, InterpolatedString] in the ComponentMappingDefinition? Wdyt?


2335-2336: Specify the return type for create_http_components_resolver.

Currently, the method's return type is annotated as Any. Would it be better to specify the exact return type HttpComponentsResolver to enhance type checking and readability? Wdyt?

unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (2)

114-148: Consider adding more test cases to test_http_components_resolver.

Currently, the parameterized test includes a single test case. Would it be helpful to add additional cases to cover more scenarios, such as different component mappings, varied retriever data, and edge cases? Wdyt?


150-189: Consider enhancing assertions in test_dynamic_streams_read.

The test verifies the number of streams and records but doesn't assert the content of the records. Including assertions that check the actual data in the records could ensure they match expected values. Wdyt?

airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (2)

34-39: Consider grouping related mappings together?

The new mappings for DynamicDeclarativeStream and HttpComponentsResolver are logically related. What do you think about grouping them together under a common section with a comment to improve readability? wdyt?

    # DeclarativeStream
    "DeclarativeStream.retriever": "SimpleRetriever",
    "DeclarativeStream.schema_loader": "JsonFileSchemaLoader",
+   # Dynamic Stream Components
    # DynamicDeclarativeStream
    "DynamicDeclarativeStream.stream_template": "DeclarativeStream",
    "DynamicDeclarativeStream.components_resolver": "HttpComponentsResolver",
    # HttpComponentsResolver
    "HttpComponentsResolver.retriever": "SimpleRetriever",
    "HttpComponentsResolver.components_mapping": "ComponentMappingDefinition",

67-70: Consider grouping schema-related mappings together?

These new mappings are related to schema handling. Would it make sense to group them with other schema-related mappings if any exist? wdyt?

+   # Schema Components
    # DynamicSchemaLoader
    "DynamicSchemaLoader.retriever": "SimpleRetriever",
    # SchemaTypeIdentifier
    "SchemaTypeIdentifier.types_map": "TypesMap",
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1687-1707: Consider adding validation examples for TypesMap?

The TypesMap component is well-defined but might benefit from examples showing valid type mappings. Would you like to add some examples to help users understand the expected format? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3e671b8 and 5a42564.

📒 Files selected for processing (16)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (9 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (4 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (8 hunks)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (12 hunks)
  • airbyte_cdk/sources/declarative/partition_routers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/components_resolver.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • airbyte_cdk/sources/declarative/schema/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/__init__.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1 hunks)
  • unit_tests/sources/declarative/test_manifest_declarative_source.py (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • unit_tests/sources/declarative/resolvers/init.py
  • airbyte_cdk/sources/declarative/resolvers/init.py
🔇 Additional comments (19)
airbyte_cdk/sources/declarative/resolvers/components_resolver.py (1)

21-23: Consistency in value type annotations

In ComponentMappingDefinition, the value field is typed as Union[InterpolatedString, str], whereas in ResolvedComponentMappingDefinition, it's just InterpolatedString. Is this difference intentional? Ensuring consistent type annotations might improve clarity. What do you think?

Also applies to: 33-35

airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1)

1-106: LGTM!

The HttpComponentsResolver class is well-implemented and follows best practices. The methods are clearly defined, and the logic appears sound.

airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1)

1-219: LGTM!

The DynamicSchemaLoader class is thoughtfully designed, and the methods for schema construction are comprehensive. The handling of data extraction and type mapping is robust.

airbyte_cdk/sources/declarative/manifest_declarative_source.py (3)

337-339: Should we verify the existence of 'requester' before setting 'use_cache'?

Currently, we set use_cache on components_resolver_config["retriever"]["requester"] without checking if 'requester' exists. Would it be safer to check if 'requester' is present within 'retriever' before setting use_cache? WDYT?


345-347: Consider adding a check for 'stream_template' in 'dynamic_definition'.

We access dynamic_definition["stream_template"] directly without verifying its existence. Should we add a check to ensure 'stream_template' is present in dynamic_definition to prevent potential KeyError exceptions? WDYT?


Line range hint 1-356: LGTM!

The integration of dynamic stream configurations enhances the source's flexibility. The additional validations and error handling improve robustness.

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

185-190: LGTM!

Combining static and dynamic stream configurations in _group_streams is a logical enhancement and should support concurrent processing effectively.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

653-677: LGTM!

The introduction of TypesMap and SchemaTypeIdentifier classes adds valuable functionality for dynamic schema loading.


1187-1216: LGTM!

The ComponentMappingDefinition class is a useful addition for defining component mappings, and the fields are well-documented.


Line range hint 1515-1576: LGTM!

The restructuring of DeclarativeSource into multiple classes to handle dynamic streams is well-executed. This change should provide greater flexibility in source configurations.

airbyte_cdk/sources/declarative/schema/__init__.py (1)

9-11: Imports and update to __all__ look appropriate.

The added imports and changes to __all__ correctly expose the new classes.

airbyte_cdk/sources/declarative/partition_routers/__init__.py (1)

9-11: Adding PartitionRouter to imports and __all__ looks good.

The module now correctly includes PartitionRouter in its public interface.

unit_tests/sources/declarative/test_manifest_declarative_source.py (4)

74-82: LGTM! Well-structured base fixture.

The _base_manifest fixture provides a clean base configuration for tests. Good practice using descriptive docstring and minimal configuration.


83-134: LGTM! Comprehensive stream configuration fixture.

The _declarative_stream fixture is well-designed with:

  • Flexible configuration through parameters
  • Clear docstring
  • Realistic test data

135-182: LGTM! Well-structured dynamic stream fixture.

The _dynamic_declarative_stream fixture effectively reuses the _declarative_stream fixture and provides a complete dynamic stream configuration.


629-659: LGTM! Thorough test coverage.

The test effectively verifies:

  • Failure case with missing streams and dynamic streams
  • Success case with regular streams
  • Success case with dynamic streams

Good job on the clear test organization and comprehensive coverage!

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

11-15: LGTM! Clear stream configuration requirement.

The anyOf condition clearly specifies that either streams or dynamic_streams must be present. Good job on making this requirement explicit in the schema.


26-29: LGTM! Well-structured dynamic_streams definition.

The dynamic_streams array property is well-defined with a clear reference to DynamicDeclarativeStream.


2992-3081: LGTM! Well-documented experimental components.

The new experimental components are thoroughly documented with:

  • Clear descriptions and experimental status warnings
  • Required properties
  • Comprehensive property descriptions
  • Proper schema references

Comment on lines +152 to +153
with pytest.raises(ValueError, match="Expected key to be a string. Got None"):
dynamic_schema_loader.get_json_schema()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check the error message in test_dynamic_schema_loader_invalid_type.

The ValueError is expected when an invalid type is provided, but the match string is "Expected key to be a string. Got None", which appears to relate to the key instead of the type. Should this be updated to match the error message for invalid types, such as "Invalid type specified"? Wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants