Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in CDM while filtering the records on the basis of write time #327

Closed
ashu9041 opened this issue Nov 4, 2024 · 1 comment · Fixed by #331
Closed

Issue in CDM while filtering the records on the basis of write time #327

ashu9041 opened this issue Nov 4, 2024 · 1 comment · Fixed by #331
Assignees
Labels
bug Something isn't working

Comments

@ashu9041
Copy link

ashu9041 commented Nov 4, 2024

I’d like to share an issue/observation regarding the CDM Jar: the current implementation of the following two features conflicts with each other:
spark.cdm.transform.custom.writetime=1726331200000000: This is used to set a custom writetime for records in the target table.

spark.cdm.filter.java.writetime.min=1 & spark.cdm.filter.java.writetime.max=1726331200000000: These properties are used to filter records for migration within a specific timestamp range.

With these two properties in place, the CDM code first assigns the timestamp specified in the custom writetime for each record and then attempts to filter the records based on the Java filter properties. This leads to all records being filtered out. Instead, the filtration should be applied to the original source writetime of each record, not the custom writetime.

Note - the Java filter properties function correctly when no custom writetime is provided.

For example- I have three records: December 30, 2024; January 1, 2025; and January 10, 2025. If I set the custom writetime to January 15 and want to transfer data up to January 1, the CDM transfers all three records. Ideally, it should only transfer the records from December 30 and January 1.
If I do not provide the custom writetime, it works correctly and transfers only the two records: December 30 and January 1.

@ashu9041
Copy link
Author

ashu9041 commented Nov 4, 2024

@pravinbhat - Could you please take a look at this issue and let me know if you need any additional information? Thank you!

@pravinbhat pravinbhat self-assigned this Nov 18, 2024
pravinbhat added a commit that referenced this issue Nov 21, 2024
@pravinbhat pravinbhat added the bug Something isn't working label Nov 21, 2024
msmygit added a commit that referenced this issue Nov 22, 2024
…d together (#331)

* Fixed issue #327 i.e. writetime filter does not work as expected when custom writetimestamp is also used.
* Removed deprecated properties `printStatsAfter` and `printStatsPerPart`. Run metrics should now be tracked using the `trackRun` feature instead.
* Apply suggestions from code review

---------

Co-authored-by: Madhavan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants