Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bins filtering by merging bins #11248

Open
wants to merge 6 commits into
base: demo-rfc80-poc
Choose a base branch
from

Conversation

fuzhaoyuan
Copy link
Contributor

Describe changes proposed in this pull request:

  • Merge numerical bins to reduce the number of bins passed to the database filtering

@alisman alisman requested a review from haynescd December 3, 2024 15:16
@alisman alisman added the RFC80 label Dec 3, 2024
@fuzhaoyuan fuzhaoyuan force-pushed the demo-rfc80-poc-bin-range-optimization branch 2 times, most recently from 1f0e84c to 8f87492 Compare December 3, 2024 21:37
@fuzhaoyuan fuzhaoyuan marked this pull request as ready for review December 4, 2024 16:10
@fuzhaoyuan fuzhaoyuan force-pushed the demo-rfc80-poc-bin-range-optimization branch 3 times, most recently from ad021d6 to abf0718 Compare December 4, 2024 16:35
@haynescd
Copy link
Collaborator

haynescd commented Dec 4, 2024

@fuzhaoyuan Looks good. Just fix the one sonar issue and we are good

haynescd
haynescd previously approved these changes Dec 4, 2024
@fuzhaoyuan
Copy link
Contributor Author

@fuzhaoyuan Looks good. Just fix the one sonar issue and we are good

Fixed. Thank you!

/**
* Merge the range of numerical bins in DataFilters to reduce the number of scans that runs on the database when filtering.
*/
private static <T extends DataFilter> List<T> mergeDataFilters(List<T> filters) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this public and add a test for this one as well. I noticed that we only have negative values in the tests, may be good to have positive values as well. A mix of positive and negative values would be even better I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By public you mean we could probably use it somewhere else right? That makes sense. I'm still trying to decide where it's the best place to put it. The test part is easy done

List<DataFilterValue> mergedValues = new ArrayList<>();

BigDecimal mergedStart = null, mergedEnd = null;
for (DataFilterValue dataFilterValue : filter.getValues()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic only works if the bins are sorted properly, right? are we sure that bins are always sorted the way we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had 2 assumptions: 1) they come in sorted and 2) they're continuous. I handled the discontinuous case by adjacent merging but if sorting is needed we'll need an extra part and way more combinations of possible cases. What would you suggest?

Copy link

sonarcloud bot commented Dec 4, 2024

@alisman alisman force-pushed the demo-rfc80-poc-bin-range-optimization branch from 736cdd0 to 455aed9 Compare December 4, 2024 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants