Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --event-time-start and --event-time-end mutually required #10878

Merged

Conversation

QMalcolm
Copy link
Contributor

Resolves #10874

Problem

--event-time-start and --event-time-end were not mutually required, which led to some weird/bad behavior if only one was specified

Solution

Make it so that one being specified necessitates the other.

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

In the next commit we'll be adding a validation that requires that `--event-time-start`
and `--event-time-end` are mutually required. That is, whenever one is specified,
the other is required. In that world, `--event-time-start` will never need to be compared
against the "current" time, because it'll never be run in conjunction with the "current"
time.
@QMalcolm QMalcolm requested a review from a team as a code owner October 17, 2024 21:57
@cla-bot cla-bot bot added the cla:yes label Oct 17, 2024
@@ -362,17 +361,26 @@ def _validate_event_time_configs(self) -> None:
getattr(self, "EVENT_TIME_END") if hasattr(self, "EVENT_TIME_END") else None
)

if event_time_start is not None and event_time_end is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably easier to see what changed if viewing commit by commit. Git didn't handle displaying the changes made across the 2 commits very gracefully.

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.19%. Comparing base (6b5db17) to head (daeabd9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10878      +/-   ##
==========================================
+ Coverage   89.17%   89.19%   +0.02%     
==========================================
  Files         183      183              
  Lines       23491    23489       -2     
==========================================
+ Hits        20947    20951       +4     
+ Misses       2544     2538       -6     
Flag Coverage Δ
integration 86.57% <100.00%> (+0.11%) ⬆️
unit 62.07% <20.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.07% <20.00%> (-0.01%) ⬇️
Integration Tests 86.57% <100.00%> (+0.11%) ⬆️

…art/end` reqs

We made it such that when `event_time_start` is specified, `event_time_end` must also
be specified (and vice versa). This broke numerous tests, in a few different ways:

1. There were tests that used `--event-time-start` without `--event-time-end` butg
were using event_time_start essentially as the `begin` time for models being initially
built or full refreshed. These tests could simply drop the `--event-time-start` and
instead rely on the `begin` value.

2. There was a test  that was trying to load a subset of the data _excluding_ some
data which would be captured by using `begin`. In this test we added an appropriate
`--event-time-end` as the `--event-time-start` was necessary to statisfy what the
test was testing

3. There was a test which was trying to ensure that two microbatch models would be
given the same "current" time. Because we wanted to ensure the "current" time code
path was used, we couldn't add `--event-time-end` to resolve the problem, thus we
needed to remove the `--event-time-start` that was being used. However, this led to
the test being incredibly slow. This was resolved by switching the relevant microbatch
models from having `batch_size`s of `day` to instead have `year`. This solution should
be good enough for roughly ~40 years? We'll figure out a better solution then, so see ya
in 2064. Assuming I haven't died before my 70th birthday, feel free to ping me to get
this taken care of.
@mirnawong1
Copy link
Contributor

hey @QMalcolm , sorry for the add'l question but do we need to update the docs here? https://docs.getdbt.com/docs/build/incremental-microbatch#timezones

@QMalcolm
Copy link
Contributor Author

hey @QMalcolm , sorry for the add'l question but do we need to update the docs here? https://docs.getdbt.com/docs/build/incremental-microbatch#timezones

@mirnawong1 Questions are great! No need to apologize 🙂 If the question is in reference to timezones (the section linked) then no update is needed as UTC will still be assumed. If the question is in reference to the page overall, yes we should probably update the document somewhere to note that --event-time-start and --event-time-end are mutually necessary (i.e. the specification of one requires the specification of the other).

@mirnawong1
Copy link
Contributor

docs pr here to address this: https://github.com/dbt-labs/docs.getdbt.com/pull/6351/files

@QMalcolm QMalcolm merged commit 8df5c96 into main Oct 29, 2024
59 of 60 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--10874-make-event-time-start-end-mutually-required branch October 29, 2024 20:31
runleonarun pushed a commit to dbt-labs/docs.getdbt.com that referenced this pull request Oct 29, 2024
this pr adds updates to incremental microbatch per core prs:

- [#10878](dbt-labs/dbt-core#10878) - makes it
so --event-time-start adn --event-time-end are mutually required.
- [#10876](dbt-labs/dbt-core#10876) - changes
lookback default window to 1 (from 0)

[ X ] dbt Core PRs must get merged first before docs pr is merged

<!-- vercel-deployment-preview -->
---
🚀 Deployment available! Here are the direct links to the updated files:


-
https://docs-getdbt-com-git-update-microbatch-dbt-labs.vercel.app/docs/build/incremental-microbatch

<!-- end-vercel-deployment-preview -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Make --event-end-time require --event-start-time and vice versa
4 participants