You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’d like to share an issue/observation regarding the CDM Jar: the current implementation of the following two features conflicts with each other: spark.cdm.transform.custom.writetime=1726331200000000: This is used to set a custom writetime for records in the target table.
spark.cdm.filter.java.writetime.min=1 & spark.cdm.filter.java.writetime.max=1726331200000000: These properties are used to filter records for migration within a specific timestamp range.
With these two properties in place, the CDM code first assigns the timestamp specified in the custom writetime for each record and then attempts to filter the records based on the Java filter properties. This leads to all records being filtered out. Instead, the filtration should be applied to the original source writetime of each record, not the custom writetime.
Note - the Java filter properties function correctly when no custom writetime is provided.
For example- I have three records: December 30, 2024; January 1, 2025; and January 10, 2025. If I set the custom writetime to January 15 and want to transfer data up to January 1, the CDM transfers all three records. Ideally, it should only transfer the records from December 30 and January 1.
If I do not provide the custom writetime, it works correctly and transfers only the two records: December 30 and January 1.
The text was updated successfully, but these errors were encountered:
…d together (#331)
* Fixed issue #327 i.e. writetime filter does not work as expected when custom writetimestamp is also used.
* Removed deprecated properties `printStatsAfter` and `printStatsPerPart`. Run metrics should now be tracked using the `trackRun` feature instead.
* Apply suggestions from code review
---------
Co-authored-by: Madhavan <[email protected]>
I’d like to share an issue/observation regarding the CDM Jar: the current implementation of the following two features conflicts with each other:
spark.cdm.transform.custom.writetime=1726331200000000: This is used to set a custom writetime for records in the target table.
spark.cdm.filter.java.writetime.min=1 & spark.cdm.filter.java.writetime.max=1726331200000000: These properties are used to filter records for migration within a specific timestamp range.
With these two properties in place, the CDM code first assigns the timestamp specified in the custom writetime for each record and then attempts to filter the records based on the Java filter properties. This leads to all records being filtered out. Instead, the filtration should be applied to the original source writetime of each record, not the custom writetime.
Note - the Java filter properties function correctly when no custom writetime is provided.
For example- I have three records: December 30, 2024; January 1, 2025; and January 10, 2025. If I set the custom writetime to January 15 and want to transfer data up to January 1, the CDM transfers all three records. Ideally, it should only transfer the records from December 30 and January 1.
If I do not provide the custom writetime, it works correctly and transfers only the two records: December 30 and January 1.
The text was updated successfully, but these errors were encountered: