Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix erroneous additional batch execution #11113

Merged

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Dec 9, 2024

Resolves #11112

Problem

Microbatch models that only ran on batch (either due to lookback=0 or setting --event-time-start + --event-time-end) would raise an Unhandled exception like the following:
Screenshot 2024-12-09 at 12 43 50

Solution

Ensure we skip the last batch execution path when only one batch needs to be run in total for a microbatch model
Screenshot 2024-12-09 at 13 50 14

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

Previously if there was only one batch, we would try to execute _two_
batches. The first batch, and a "last" non existent batch. This would
result in an unhandled exception.
@QMalcolm QMalcolm requested a review from a team as a code owner December 9, 2024 19:51
@cla-bot cla-bot bot added the cla:yes label Dec 9, 2024
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.94%. Comparing base (03fdb4c) to head (5942aa5).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11113      +/-   ##
==========================================
- Coverage   88.96%   88.94%   -0.03%     
==========================================
  Files         183      183              
  Lines       23933    23934       +1     
==========================================
- Hits        21291    21287       -4     
- Misses       2642     2647       +5     
Flag Coverage Δ
integration 86.31% <100.00%> (-0.03%) ⬇️
unit 61.96% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 61.96% <0.00%> (-0.01%) ⬇️
Integration Tests 86.31% <100.00%> (-0.03%) ⬇️

@QMalcolm QMalcolm changed the title Qmalcolm 11112 fix erroneous additional batch execution Fix erroneous additional batch execution Dec 9, 2024
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks mostly good!

From organizing code perspective what's your thoughts on hide all logic here in the _submit_batch function? It seems like we also have some special logic to handle the last batch in that function. I think it's good to have all these in a single logic layer vs multiple. WDYT?

relation_exists = self._submit_batch(
node=node,
adapter=runner.adapter,
relation_exists=relation_exists,
batches=batches,
batch_idx=batch_idx,
batch_results=batch_results,
pool=pool,
force_sequential_run=True,
)
batch_idx += 1
skip_batches = batch_results[0].status != RunStatus.Success
# Run all batches except first and last batch, in parallel if possible
while batch_idx < len(runner.batches) - 1:
relation_exists = self._submit_batch(
node=node,
adapter=runner.adapter,
relation_exists=relation_exists,
batches=batches,
batch_idx=batch_idx,
batch_results=batch_results,
pool=pool,
skip=skip_batches,
)
batch_idx += 1
# Wait until all submitted batches have completed
while len(batch_results) != batch_idx:
pass
# Only run "last" batch if there is more than one batch
if len(batches) != 1:
# Final batch runs once all others complete to ensure post_hook runs at the end
self._submit_batch(
node=node,
adapter=runner.adapter,
relation_exists=relation_exists,
batches=batches,
batch_idx=batch_idx,
batch_results=batch_results,
pool=pool,
force_sequential_run=True,
skip=skip_batches,
)

@QMalcolm
Copy link
Contributor Author

QMalcolm commented Dec 9, 2024

The change looks mostly good!

From organizing code perspective what's your thoughts on hide all logic here in the _submit_batch function? It seems like we also have some special logic to handle the last batch in that function. I think it's good to have all these in a single logic layer vs multiple. WDYT?

relation_exists = self._submit_batch(
node=node,
adapter=runner.adapter,
relation_exists=relation_exists,
batches=batches,
batch_idx=batch_idx,
batch_results=batch_results,
pool=pool,
force_sequential_run=True,
)
batch_idx += 1
skip_batches = batch_results[0].status != RunStatus.Success
# Run all batches except first and last batch, in parallel if possible
while batch_idx < len(runner.batches) - 1:
relation_exists = self._submit_batch(
node=node,
adapter=runner.adapter,
relation_exists=relation_exists,
batches=batches,
batch_idx=batch_idx,
batch_results=batch_results,
pool=pool,
skip=skip_batches,
)
batch_idx += 1
# Wait until all submitted batches have completed
while len(batch_results) != batch_idx:
pass
# Only run "last" batch if there is more than one batch
if len(batches) != 1:
# Final batch runs once all others complete to ensure post_hook runs at the end
self._submit_batch(
node=node,
adapter=runner.adapter,
relation_exists=relation_exists,
batches=batches,
batch_idx=batch_idx,
batch_results=batch_results,
pool=pool,
force_sequential_run=True,
skip=skip_batches,
)

@ChenyuLInx I'm down to refactor this code 🙂 I'd definitely like to simplify things here a fair bit. However, this change is going to need to be backported to 1.9.latest for 1.9.1. As such I think a refactor here is out of scope, as such work should probably be forward facing.

@ChenyuLInx
Copy link
Contributor

My preference is do the refactor as you change the code if it is not too complex

going to need to be backported to 1.9.latest

Why this changes whether we should do the refactor?

@QMalcolm
Copy link
Contributor Author

My preference is do the refactor as you change the code if it is not too complex

going to need to be backported to 1.9.latest

Why this changes whether we should do the refactor?

My preference is to keep refactors as separate as possible from changes in logic. I've found it's desirable to not do a refactor as part of work which is being backported. This is because:

  1. The refactor is irrelevant to the fix, and thus doesn't have a need to be backported
  2. The more changes we introduce, the greater the likelihood of introducing a new edge case or bug

Now, I'm generally of the opinion that refactors should be separated from logical changes or bug fixes, unless it is impossible to separate the two. I'm a little looser on that when it is a forward-looking change only (i.e. solely going into an alpha/beta). With backports though, I take it a bit more seriously

If the worry is that the refactor won't get done unless it comes alongside the bug fix / logic change, then I'll promise to do the refactor as the next PR I open 🙂

@ChenyuLInx
Copy link
Contributor

I am happy this goes in as is with a fast follow-up PR of the refactor.

One thing to note is If you don't do the refactor as part of this change and get it backported, when we have to have some other fixes around this go in, you will have to backport the refactor and then the fix(or do a custom fix).

@QMalcolm QMalcolm merged commit c9582c2 into main Dec 10, 2024
62 of 63 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--11112-fix-erroneous-additional-batch-execution branch December 10, 2024 15:28
github-actions bot pushed a commit that referenced this pull request Dec 10, 2024
* Update single batch test case to check for generic exceptions

* Explicitly skip last final batch execution when there is only one batch

Previously if there was only one batch, we would try to execute _two_
batches. The first batch, and a "last" non existent batch. This would
result in an unhandled exception.

* Changie doc

(cherry picked from commit c9582c2)
QMalcolm added a commit that referenced this pull request Dec 10, 2024
* Update single batch test case to check for generic exceptions

* Explicitly skip last final batch execution when there is only one batch

Previously if there was only one batch, we would try to execute _two_
batches. The first batch, and a "last" non existent batch. This would
result in an unhandled exception.

* Changie doc

(cherry picked from commit c9582c2)

Co-authored-by: Quigley Malcolm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Microbatch models with only one batch raise unhandled error
2 participants