Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more than 1024 IP/masks with indexed field #16391

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mkhludnev
Copy link
Contributor

@mkhludnev mkhludnev commented Oct 19, 2024

Now

Proposal

  • for indexed field combine masks with MultiRangePointQuery
  • for doc_values only field still fall back to Boolean Query

Related Issues

Resolves #16200

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities labels Oct 19, 2024
Copy link
Contributor

❌ Gradle check result for c9e2bd1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c9e2bd1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b6c3410: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mkhludnev
Copy link
Contributor Author

it's an alt of #16202

Copy link
Contributor

github-actions bot commented Nov 8, 2024

❌ Gradle check result for 6a11b54: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 26ff736: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

mkhludnev referenced this pull request Nov 12, 2024
* Updating Ip fields to use doc_values to search

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Fix IP tests

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Fix skip to allow yaml test to pass on main

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Update tests to use existing test file

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Changing skip version to match bwc

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Using exact match instead of range

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Spotless

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Fix IP field tests

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Fix spotless + precommit failure

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Get point out of query and into value

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Fix term tests

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

* Add skip test logic to only doc_values test

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>

---------

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Copy link
Contributor

❌ Gradle check result for 01db875: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 5849b96: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for d03b618: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 7e1f8b4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c429cf0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mkhludnev
Copy link
Contributor Author

mkhludnev commented Nov 20, 2024

Let's discuss edge cases, which are not obvious:

doc_values only field with mask/ values

In this case we can just create BooleanQuery { ssDvRangeQuery, ...}. It may lately fail with too many clauses, and it's reasonable. Note: it will be a way better with ssDvMultiRange

index & doc_values field with mask/ values

  • a strawman approach: forget about dv and IndexOrDvQuery, and just create MultiRangePointQuery

  • attempt to create IndexOrDvQuery(MultiRangePointQuery, BooleanQuery { ssDvRangeQuery, ...}) when number of BQ is less than MaxClauses limit.

The problem is that this boundary limit can't be decided on query parsing because sibling filter clauses may exceed MaxClauses limit.

Copy link
Contributor

❌ Gradle check result for cfa3904: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mkhludnev mkhludnev changed the title IP field via MultrangeQuery fix #16200 Support more than 1024 IP/masks with indexed field Nov 20, 2024
@mkhludnev mkhludnev marked this pull request as ready for review November 20, 2024 20:53
Copy link
Contributor

✅ Gradle check result for a4d65db: SUCCESS

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 86.15385% with 9 lines in your changes missing coverage. Please review.

Project coverage is 72.14%. Comparing base (3b4fa0e) to head (341e256).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...ava/org/opensearch/index/mapper/IpFieldMapper.java 86.15% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16391      +/-   ##
============================================
+ Coverage     72.11%   72.14%   +0.02%     
- Complexity    65192    65251      +59     
============================================
  Files          5318     5318              
  Lines        303903   303949      +46     
  Branches      43970    43985      +15     
============================================
+ Hits         219166   219288     +122     
+ Misses        66786    66719      -67     
+ Partials      17951    17942       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for 75b2719: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mkhludnev
Copy link
Contributor Author

Hi there!
Kindly asking for review. And tagging as backport.
Thanks!

@reta reta added backport 2.x Backport to 2.x branch v3.0.0 Issues and PRs related to version 3.0.0 v2.19.0 Issues and PRs related to version 2.19.0 labels Nov 22, 2024
@reta
Copy link
Collaborator

reta commented Nov 22, 2024

In this case we can just create BooleanQuery { ssDvRangeQuery, ...}. It may lately fail with too many clauses, and it's reasonable. Note: it will be a way better with ssDvMultiRange

Just a question, since with this change we still don't solve the problem completely (by and large, the limit might be hit anyway), is it worth the effort?

@mkhludnev
Copy link
Contributor Author

mkhludnev commented Nov 22, 2024

is it worth the effort?

I think it is: it unblocks large terms_queries for indexed fields right now. Regarding doc_values solution: Lucene#13974, I'm not sure if I'm able to complete it and how long it take.

PS. Now, if terms_query has 2K IPs it works fine even for doc_values-only, but adding single mask flip it to too many clauses. This fix make it (2K IPs + less than 1K masks) work (even for doc_values-only).

Signed-off-by: Mikhail Khludnev <[email protected]>
Copy link
Contributor

✅ Gradle check result for e8e769f: SUCCESS

@reta
Copy link
Collaborator

reta commented Nov 22, 2024

LGTM, thanks @mkhludnev ! @msfroh anything from your side?

Copy link
Collaborator

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mkhludnev! Looks good.

I agree that your new test should probably be marked as an IT, but otherwise I'm happy to merge.

Signed-off-by: Mikhail Khludnev <[email protected]>
Signed-off-by: Mikhail Khludnev <[email protected]>
Copy link
Contributor

✅ Gradle check result for 341e256: SUCCESS

mkhludnev and others added 2 commits November 26, 2024 22:47
Co-authored-by: Andriy Redko <[email protected]>
Signed-off-by: Mikhail Khludnev <[email protected]>
Signed-off-by: Mikhail Khludnev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] relax max Clauses Count limitation of termS query over IP field
3 participants