-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/investigate inconsistent partition size in fetching larger datasets #41
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, maybe the failure you were seeing only happens sporadically? You might try bumping CI a few times to see if things consistently pass
@jrbourbeau I left the xfail mark on the test parameter, so it "successfully failed" here. Sorry for that confusion. I'm trying to (roughly) bisect where things fall apart. I'll push that as parameter to the test and remove the xfail (for now) |
@jrbourbeau (cc @hayesgb) I figured out what's going on here. Since we have so little control over the sizes of the batches that snowflake is returning, so of the individual batches are larger than the requested partition size. So the answer to the would be to split up the batch. Inside PDB during a failing test:
So that means that I see some possible options here:
|
Potential third option from me: relax the test a bit. Current the check is: assert (partition_sizes < 2 * parse_bytes("2 MiB")).all() So we're already using a fudge factor of 2. Based on what I've noticed digging into this that we could get away with something between 2.2 and 2.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phobson and I chatted a bit offline about this one and decided to go with adjusting the fudge factor in the test and a small additional note in the read_snowflake
docstring about how we only attempt to merge snowflake result batches, but not split them.
Looks into #40