-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low-code cdk): add dynamic schema loader #104
Conversation
/autofix
|
Warning Rate limit exceeded@lazebnyi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 59 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces three new components— Changes
Possibly related PRs
Suggested reviewers
Would you like to consider any additional reviewers or related PRs? Wdyt? Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
1575-1581
: Adjust default values forname
andprimary_key
fieldsWould you consider setting the default values of
name
andprimary_key
toNone
instead of an empty string? This can help avoid confusion with optional fields and ensure consistency. Wdyt?Proposed change:
name: Optional[str] = Field( - "", description="The stream name.", example=["Users"], title="Name" + None, description="The stream name.", example="Users", title="Name" ) primary_key: Optional[PrimaryKey] = Field( - "", description="The primary key of the stream.", title="Primary Key" + None, description="The primary key of the stream.", title="Primary Key" )
654-679
: Suggestion onis_nullable
field inSchemaTypeIdentifier
Would it be clearer to use
is_nullable: bool = Field(True, ...)
instead ofOptional[bool]
since it has a default value? This can improve type clarity. Wdyt?Proposed change:
- is_nullable: Optional[bool] = Field( + is_nullable: bool = Field( True, description="Add null to defined field type. This field is automatically set by the CDK.", title="Is Nullable", )unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (2)
12-48
: Consider adding docstrings to fixtures?The fixtures are well-structured, but adding docstrings would make their purpose and setup more clear to other developers. Here's a suggested format, wdyt?
@pytest.fixture def mock_retriever(): + """ + Creates a mock retriever that returns predefined schema records. + Returns: + MagicMock: A mock retriever with read_records method returning sample schema data. + """ retriever = MagicMock()
51-128
: The test cases look great! Consider adding a docstring?The parameterized test cases provide excellent coverage. Adding a docstring to explain the test's purpose and parameters would enhance documentation, wdyt?
def test_dynamic_schema_loader(dynamic_schema_loader, retriever_data, expected_schema): + """ + Tests DynamicSchemaLoader's schema generation for various input scenarios. + + Args: + dynamic_schema_loader: Fixture providing the loader instance + retriever_data: Input data simulating API responses + expected_schema: Expected JSON schema output + """ dynamic_schema_loader.retriever.read_records.return_value = retriever_dataairbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1666-1686
: Consider adding examples for TypesPair?The schema is well-defined, but adding examples would help users understand the expected format, wdyt?
current_type: anyOf: - type: string - type: array items: type: string + examples: + - "string" + - ["string", "integer"]
1687-1732
: Consider adding pattern validation for pointers?The schema pointers could benefit from regex patterns to ensure valid paths, wdyt?
schema_pointer: title: Schema Path description: List of potentially nested fields describing the full path of the field to extract. type: array items: - type: string + pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$"
1733-1756
: Consider enhancing experimental status documentation?The experimental status warning could be more detailed about potential risks and limitations, wdyt?
- description: (This component is experimental. Use at your own risk.) Loads a schema by extracting data from retrieved records. + description: | + (This component is experimental. Use at your own risk.) + Loads a schema by extracting data from retrieved records. + + Note: As an experimental feature: + - The interface may change in non-backward compatible ways + - Support is limited + - Not recommended for production use without thorough testing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(5 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(11 hunks)airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(9 hunks)airbyte_cdk/sources/declarative/partition_routers/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/schema/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(1 hunks)unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(1 hunks)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1)
106-128
: Overall implementation looks solid
The get_json_schema
method effectively constructs the JSON Schema dynamically. Good job on handling the iterations and data extraction properly.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
650-653
: Definition of TypesPair
looks good
The TypesPair
class correctly models the mapping between current and target types. Nice addition!
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1570-1629
: Implementation of new factory methods
The methods create_types_pair
, create_schema_type_identifier
, and create_dynamic_schema_loader
are well-implemented and integrate seamlessly with the factory pattern. Great work!
airbyte_cdk/sources/declarative/schema/__init__.py (1)
9-11
: Exporting new schema components
Including DynamicSchemaLoader
, TypesPair
, and SchemaTypeIdentifier
in __all__
makes sense. This ensures they are available for import elsewhere.
airbyte_cdk/sources/declarative/partition_routers/__init__.py (1)
9-11
: Adding PartitionRouter
to exports
Adding PartitionRouter
to the __all__
list appropriately exposes it as a public interface. Good job!
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)
61-64
: LGTM! The new mappings look well-structured. 🎯
The additions follow the existing pattern nicely and align with the dynamic schema loader feature. Just to double-check our integration - would it make sense to verify that these components are properly registered in their respective definition files? wdyt?
✅ Verification successful
Looking good! The component mappings are correctly aligned with their implementations. 👍
The verification shows that both mappings in manifest_component_transformer.py
are valid:
DynamicSchemaLoader.retriever
maps toSimpleRetriever
, which is a valid option in the component schemaSchemaTypeIdentifier.types_map
maps toTypesPair
, which is correctly defined and used in the schema type identifier class
Both components are properly defined in the codebase with their respective dependencies, and the mappings follow the established patterns. The experimental nature of these components is also consistently documented.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the component definitions exist and match the mapped types
# Check for DynamicSchemaLoader definition and SimpleRetriever type
echo "Checking DynamicSchemaLoader definition..."
rg -l "class DynamicSchemaLoader.*SimpleRetriever"
# Check for SchemaTypeIdentifier definition and TypesPair type
echo "Checking SchemaTypeIdentifier definition..."
rg -l "class SchemaTypeIdentifier.*TypesPair"
Length of output: 292
Script:
#!/bin/bash
# Let's try a different approach with ast-grep to find these class definitions
# and their relationships with the mapped types
echo "Checking DynamicSchemaLoader definition..."
ast-grep --pattern 'class DynamicSchemaLoader {
$$$
}'
echo -e "\nChecking SchemaTypeIdentifier definition..."
ast-grep --pattern 'class SchemaTypeIdentifier {
$$$
}'
echo -e "\nChecking for TypesPair usage..."
rg "TypesPair" -A 2 -B 2
echo -e "\nChecking for SimpleRetriever usage..."
rg "SimpleRetriever" -A 2 -B 2
Length of output: 66058
Script:
#!/bin/bash
# Let's check for the DynamicSchemaLoader and SchemaTypeIdentifier class definitions
# and their relationship with the mapped types in the schema files
echo "Checking DynamicSchemaLoader in schema files..."
rg "DynamicSchemaLoader" airbyte_cdk/sources/declarative/schema/ -A 5 -B 5
echo -e "\nChecking SchemaTypeIdentifier in schema files..."
rg "SchemaTypeIdentifier" airbyte_cdk/sources/declarative/schema/ -A 5 -B 5
echo -e "\nChecking component schema definitions..."
rg "DynamicSchemaLoader|SchemaTypeIdentifier" airbyte_cdk/sources/declarative/declarative_component_schema.yaml -A 5 -B 5
Length of output: 7591
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
851-851
: Minor grammatical corrections in descriptions.Consider decapitalizing "Specific" in the descriptions and refining wording for clarity. For example, changing "The DeclarativeOAuth Specific list..." to "The DeclarativeOAuth specific list...". WDYT?
You could apply these diffs:
For line 851:
- description="The DeclarativeOAuth Specific list of strings to indicate which keys should be extracted and returned back to the input config.", + description="The DeclarativeOAuth specific list of strings indicating which keys should be extracted and returned to the input config.",For line 857:
- description="The DeclarativeOAuth Specific object to provide the criteria of how the `state` query param should be constructed,\nincluding length and complexity.", + description="The DeclarativeOAuth specific object providing the criteria for how the `state` query parameter should be constructed, including length and complexity.",For line 881:
- description="The DeclarativeOAuth Specific optional override to provide the custom `state` key name, if required by data-provider.", + description="The DeclarativeOAuth specific optional override to provide the custom `state` key name, if required by the data provider.",For line 887:
- description="The DeclarativeOAuth Specific optional override to provide the custom `code` key name to something like `auth_code` or `custom_auth_code`, if required by data-provider.", + description="The DeclarativeOAuth specific optional override to provide the custom `code` key name, such as `auth_code` or `custom_auth_code`, if required by the data provider.",Also applies to: 857-857, 881-881, 887-887
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(5 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
648-650
: LGTM!
The TypesPair
class definition looks correct and follows code conventions.
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from sync discussion
@aldogonzalez8 can you have a look a this? You'll have to use this for google sheets
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1585-1610
: Consider simplifying the pointer list comprehensions.The list comprehensions for schema_pointer, key_pointer, and type_pointer could be made more readable. WDYT about:
- model_schema_pointer: List[Union[InterpolatedString, str]] = ( - [x for x in model.schema_pointer] if model.schema_pointer else [] - ) - model_key_pointer: List[Union[InterpolatedString, str]] = [x for x in model.key_pointer] - model_type_pointer: Optional[List[Union[InterpolatedString, str]]] = ( - [x for x in model.type_pointer] if model.type_pointer else None - ) + model_schema_pointer = list(model.schema_pointer) if model.schema_pointer else [] + model_key_pointer = list(model.key_pointer) + model_type_pointer = list(model.type_pointer) if model.type_pointer else Noneairbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
653-656
: Consider adding docstrings for better documentation?The
TypesMap
class looks good, but would it be helpful to add docstrings explaining the purpose oftarget_type
andcurrent_type
fields and their relationship? This could help other developers understand how to use this mapping. WDYT?airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (3)
112-121
: Consider handling the case when no records are retrieved.Currently, if
retrieved_record
isNone
,raw_schema
is set to an empty list, resulting in an emptyproperties
in the generated JSON Schema. Would you consider adding error handling or logging to inform users when no records are retrieved? Wdyt?Possible code change:
def get_json_schema(self) -> Mapping[str, Any]: """ Constructs a JSON Schema based on retrieved data. """ properties = {} retrieved_record = next(self.retriever.read_records({}), None) + if retrieved_record is None: + # Log a warning or raise an exception + raise ValueError("No records retrieved. Cannot generate schema.") raw_schema = ( self._extract_data( retrieved_record, self.schema_type_identifier.schema_pointer, ) if retrieved_record else [] )
175-177
: Consider improving the error message for invalid data types.The error message might be unclear to users. Would you consider rephrasing it to be more descriptive? For example:
raise ValueError( - f"Invalid data type. Available string or two items list of string. Got {mapped_field_type}." + f"Invalid data type '{mapped_field_type}'. Expected a string or a list of two strings." )Wdyt?
199-199
: Question about the necessity ofdeepcopy
.In
_get_airbyte_type
, we are performing adeepcopy
on theAIRBYTE_DATA_TYPES
entry. If the returned mapping is not modified elsewhere, would it be acceptable to return the original mapping to improve performance? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(5 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(5 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(9 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(1 hunks)
🔇 Additional comments (9)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
Line range hint 1303-1382
: LGTM! Nice refactoring of the stream slicer logic.
The decomposition into smaller, focused methods improves readability and maintainability. The separation of concerns between partition router and paginator logic is particularly clean.
1581-1583
: LGTM! Simple and straightforward implementation.
The create_types_map method is clean and follows the factory pattern well.
1612-1636
: LGTM! Well-structured dynamic schema loader creation.
The method follows the established pattern for retriever-based components and properly handles the stream slicer creation.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
658-677
: LGTM! Well-structured class with clear field definitions.
The SchemaTypeIdentifier
class has appropriate field types and good defaults. The optional fields provide flexibility while maintaining required functionality through the mandatory key_pointer
field.
1734-1743
: LGTM! Clean implementation of the dynamic schema loader.
The DynamicSchemaLoader
follows the established pattern of other schema loaders while introducing new dynamic schema loading capabilities. The required fields ensure all necessary components are provided.
1928-1928
: LGTM! Proper handling of forward references.
The addition of DynamicSchemaLoader
to the forward references ensures proper model resolution.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1679-1699
: LGTM! Clear schema definition with appropriate experimental warning.
The schema for TypesMap
is well-defined and matches the Python implementation. The experimental warning is a good practice for managing user expectations.
1700-1741
: LGTM! Comprehensive schema definition with clear field descriptions.
The schema for SchemaTypeIdentifier
is well-documented with clear descriptions for each field. The experimental warning and field requirements match the Python implementation.
1742-1764
: LGTM! Well-defined schema with appropriate requirements.
The schema for DynamicSchemaLoader
properly defines the component's structure with clear requirements and descriptions. The experimental warning is consistently applied.
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1613-1637
: Should we provide a more descriptive name for the retriever?In the
create_dynamic_schema_loader
method, thename
parameter for the retriever is set to an empty string. Providing a descriptive name might help with debugging and logging. Perhaps we could use something like"dynamic_schema_retriever"
? WDYT?
1303-1337
: Consider refactoring duplicate logic in slicer methods?I noticed that
_build_stream_slicer_from_partition_router
and_build_resumable_cursor_from_paginator
have similar patterns and are used together when building stream slicers. Do you think it might be beneficial to refactor this logic into a shared helper function to reduce duplication and improve maintainability? WDYT?Also applies to: 1382-1382
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(9 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(1 hunks)unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
- airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1581-1637
: Great addition of the dynamic schema components!
The implementation of create_types_map
, create_schema_type_identifier
, and create_dynamic_schema_loader
looks solid and aligns well with the existing factory patterns. This should enhance the schema management capabilities effectively.
1615-1618
: Is combined_slicers
correctly assigned?
In create_dynamic_schema_loader
, after calling _build_resumable_cursor_from_paginator
, combined_slicers
might end up being None
if there's no paginator. Should we handle this case to ensure that the stream_slicer
passed to the retriever is valid? Perhaps adding a check or default value could prevent potential issues. WDYT?
The changes make sense to me. Just one thing, can we include a test with a manifest and the new dynamic schema loader setup? It would be nice to have like the ones in this other PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only thing for me that would be missing is a very basic test that shows how it works in a manifest. Something similar to the test_concurrent_declarative_source
. It doesn't need to be exhaustive but mostly to test that the discover method will return the dynamic stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
905-919
: Consider improving the description formatting for better readability.The description text in
oauth_user_input_from_connector_config_specification
could be more readable with proper line breaks and indentation. WDYT about this format?- description="OAuth specific blob. This is a Json Schema used to validate Json configurations used as input to OAuth.\nMust be a valid non-nested JSON that refers to properties from ConnectorSpecification.connectionSpecification\nusing special annotation 'path_in_connector_config'.\nThese are input values the user is entering through the UI to authenticate to the connector, that might also shared\nas inputs for syncing data via the connector.\nExamples:\nif no connector values is shared during oauth flow, oauth_user_input_from_connector_config_specification=[]\nif connector values such as 'app_id' inside the top level are used to generate the API url for the oauth flow,\n oauth_user_input_from_connector_config_specification={\n app_id: {\n type: string\n path_in_connector_config: ['app_id']\n }\n }\nif connector values such as 'info.app_id' nested inside another object are used to generate the API url for the oauth flow,\n oauth_user_input_from_connector_config_specification={\n app_id: {\n type: string\n path_in_connector_config: ['info', 'app_id']\n }\n }", + description=""" + OAuth specific blob. This is a Json Schema used to validate Json configurations used as input to OAuth. + Must be a valid non-nested JSON that refers to properties from ConnectorSpecification.connectionSpecification + using special annotation 'path_in_connector_config'. + + These are input values the user is entering through the UI to authenticate to the connector, that might also shared + as inputs for syncing data via the connector. + + Examples: + if no connector values is shared during oauth flow: + oauth_user_input_from_connector_config_specification=[] + + if connector values such as 'app_id' inside the top level are used to generate the API url for the oauth flow: + oauth_user_input_from_connector_config_specification={ + app_id: { + type: string + path_in_connector_config: ['app_id'] + } + } + + if connector values such as 'info.app_id' nested inside another object are used to generate the API url for the oauth flow: + oauth_user_input_from_connector_config_specification={ + app_id: { + type: string + path_in_connector_config: ['info', 'app_id'] + } + } + """,
660-678
: Consider adding more descriptive field documentation.The SchemaTypeIdentifier fields could benefit from more detailed descriptions to help users understand their purpose. For example:
- schema_pointer: What kind of schema fields are expected?
- key_pointer: What kind of keys should be extracted?
- type_pointer: How is this different from schema_pointer?
WDYT about adding more context to help users understand these fields better?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1688-1708
: Consider adding examples to TypesMap properties.The TypesMap definition could benefit from examples showing how to map different types. WDYT about adding examples like:
examples: target_type: ["string", "integer"] # or "string" current_type: "text" # or ["varchar", "text"]
1709-1749
: Consider improving the field descriptions in SchemaTypeIdentifier.The descriptions could be more specific about the relationship between these fields. For example:
- How do schema_pointer and type_pointer interact?
- What's the expected format of the types_mapping array?
WDYT about adding this context to help users understand the component better?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(12 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
655-658
: LGTM! The TypesMap model looks good.
The model is well-defined with required fields and appropriate type annotations.
1820-1828
: Add Field(...) to schema_type_identifier to specify it as a required field.
Currently, schema_type_identifier
is missing the Field(...)
wrapper indicating it's a required field. Adding Field(...)
will make it consistent with other required fields and ensure proper Pydantic validation. WDYT?
- schema_type_identifier: SchemaTypeIdentifier
+ schema_type_identifier: SchemaTypeIdentifier = Field(
+ ...,
+ description="Specifies how to identify types within the schema.",
+ title="Schema Type Identifier",
+ )
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1750-1773
: LGTM! The DynamicSchemaLoader definition looks good.
The component is well-defined with clear descriptions and required fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (3)
12-48
: Consider expanding test fixtures with more edge cases?The fixtures look good but could we enhance them with additional test cases? For example:
- Nested objects
- Arrays
- Null values
- Custom types that need mapping
This would help ensure the schema loader handles all possible scenarios. WDYT?
50-126
: How about adding more test scenarios?The parameterized tests look solid! Would you consider adding these scenarios:
- Multiple records with conflicting types for the same field
- Very large numbers that might need special handling
- Date/time fields with various formats
- Fields with special characters in names
This would help catch edge cases early. WDYT?
318-363
: Consider enhancing manifest integration test?The manifest integration test looks good! Would you consider adding:
- Verification of the types_mapping conversion (e.g., singleLineText → string)
- Assertions for schema metadata (e.g., $schema version)
- Validation that the schema matches the actual records
This would provide more confidence in the integration. WDYT?
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
653-656
: Would you like to add more documentation for TypesMap?The class structure looks good! Consider adding:
- Field descriptions explaining the purpose of target_type and current_type
- Examples showing both single type and list type mappings
- Notes about type conversion behavior
This would make it easier for other developers to use the class correctly. WDYT?
658-677
: Consider adding validation for pointer fields?The SchemaTypeIdentifier looks well-designed! Would you consider:
- Adding min_length validation for key_pointer to ensure it's not empty
- Adding examples for each pointer field to show valid paths
- Adding a validator to ensure type_pointer is provided if types_mapping is used
This would help prevent configuration errors. WDYT?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(6 hunks)unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(1 hunks)
🔇 Additional comments (2)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)
146-153
: Error message doesn't match the invalid type scenario
The error message "Expected key to be a string. Got None" doesn't align with testing an invalid type. Should it be more specific about the type validation failure?
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1808-1817
: Add Field(...) to schema_type_identifier?
The DynamicSchemaLoader looks good! Adding Field(...) would make it consistent with other required fields and ensure proper validation. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1688-1708
: The TypesMap component could use more documentation, wdyt?While the component structure is solid, consider enhancing the documentation with:
- Examples of type mappings
- Use cases for when you'd use string vs array of strings
- Common scenarios where this component would be useful
1709-1749
: The field descriptions look good, but could we enhance them further?The component is well-structured, but consider:
- Adding examples for each pointer type to show their differences in practice
- Documenting the relationship between schema_pointer, key_pointer, and type_pointer
- Adding examples of types_mapping usage with TypesMap
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1221-1221
: LGTM! The DynamicSchemaLoader is properly integrated
The addition of DynamicSchemaLoader as a schema_loader option in DeclarativeStream is well-placed and aligns with the PR's objective to support dynamic schema loading.
1750-1773
: The DynamicSchemaLoader looks good, but let's verify its integration
The component structure is solid. However, we should verify:
- That all supported retriever types work correctly with dynamic schema loading
- The interaction between the retriever and schema_type_identifier
✅ Verification successful
The DynamicSchemaLoader implementation looks solid and well-tested! 🎯
The verification shows:
- Comprehensive test coverage including various scenarios (valid types, missing types, nested types, empty records)
- Integration with all retriever types is properly defined in the schema
- Real-world usage example in the manifest flow test
- Proper error handling for invalid keys and types
The component works as expected with the SimpleRetriever and has appropriate test coverage. The experimental status is properly marked in both the implementation and schema definition.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for existing implementations and test coverage
echo "Checking for implementations and tests..."
rg -g "test_*.py" "DynamicSchemaLoader" -A 5
Length of output: 1983
Script:
#!/bin/bash
# Let's check the implementation details and test coverage more thoroughly
echo "Checking test coverage for DynamicSchemaLoader..."
rg -g "test_*.py" -B 5 -A 10 "def test.*DynamicSchemaLoader"
echo -e "\nChecking implementation details..."
ast-grep --pattern 'class DynamicSchemaLoader {
$$$
}'
echo -e "\nChecking for integration with different retriever types..."
rg "DynamicSchemaLoader.*retriever.*SimpleRetriever|DynamicSchemaLoader.*retriever.*AsyncRetriever|DynamicSchemaLoader.*retriever.*CustomRetriever" -A 3
Length of output: 1034
Script:
#!/bin/bash
# Let's look at the actual test file content to understand test coverage
echo "Checking test file content..."
cat unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
echo -e "\nChecking for actual implementation class..."
rg -g "!test_*" "class DynamicSchemaLoader" -B 2 -A 10
Length of output: 14970
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm wondering if @aldogonzalez8 has used it in source-google-sheets and can approve this as well
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
I have some advance and I confirm I've been able to use the DynamicStreams and DynamicSchema for google-sheets with a pre-dev cdk with minor changes that I think @lazebnyi has already added in last commits. |
Actualy is weird that the compare doesn't show me the latest changes in this PR but I assume is some bad cache or I did something wrong in the compare as I can see these options have already been added.
@lazebnyi, what is the command/action that re-generated the file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED
@aldogonzalez8 |
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
What
This PR add dynamic schema loader that can dynamically generate schema based on response.|
Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10782
How
Described schema, models and data classes for
DynamicSchemaLoader
,SchemaTypeIdentifier
andTypesPair
. Defined behavior for schema resolving. Integrated the dynamic schema handling into the existing framework.Summary by CodeRabbit
Release Notes
New Features
TypesMap
,SchemaTypeIdentifier
, andDynamicSchemaLoader
.Bug Fixes
DynamicSchemaLoader
for invalid keys and types.Tests
DynamicSchemaLoader
to validate functionality and error handling.