Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Range Search Pagination Retain Order so No Duplication when using Different Offset #35464

Open
1 task done
PwzXxm opened this issue Aug 14, 2024 · 6 comments
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@PwzXxm
Copy link
Contributor

PwzXxm commented Aug 14, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Support eliminating ID duplications while Range Search via search API with offset. Considering two search requests differing only in offset

  • Limit 10, Offset 0
  • Limit 10, Offset 10

The search results are illustrated below. Some IDs (shadowed as red) appeared in the first search results (first page), may appear again in the second search results (second page), due to how it was implemented in Milvus and the increased accuracy of the second search request.

image

We want to be able to

  1. Reduce the possibility of this happening (G1)
  2. Eliminate this from happening and guarantee there is no repetition (G2)

Describe the solution you'd like.

For G1, the user should able to adjust params so that the accuracy is higher and less duplication should appear in previous pages.

For G2, introduce a new parameter page_retain_order in search API of the SDKs. Add it at the same level as offset. The downside is that we may lose some results.

Add it to the same level as radius since it only applies to range search.

Describe an alternate solution.

No response

Anything else? (Additional Context)

This is a ZillizCloud-only feature.

@PwzXxm PwzXxm added the kind/feature Issues related to feature request from users label Aug 14, 2024
@PwzXxm
Copy link
Contributor Author

PwzXxm commented Aug 14, 2024

/assign

@xiaofan-luan
Copy link
Collaborator

For Goal1, can we increase ef to a a large enough value to avoid this happen on opensource?

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Aug 16, 2024

For Goal1, can we increase ef to a a large enough value to avoid this happen on opensource?

Sure, will do.

sre-ci-robot pushed a commit that referenced this issue Aug 29, 2024
sre-ci-robot pushed a commit to milvus-io/pymilvus that referenced this issue Sep 3, 2024
sre-ci-robot pushed a commit to milvus-io/pymilvus that referenced this issue Sep 3, 2024
sre-ci-robot pushed a commit to milvus-io/milvus-sdk-go that referenced this issue Sep 14, 2024
@zhuwenxing
Copy link
Contributor

test pr already merged:https://github.com/zilliztech/milvus-cloud-test/pull/108

search with page

        for offset in range(0,total_limit,total_limit//pages):
            param = {
                "metric_type": metric_type,
                "params": {
                    "level": 1,
                    "offset": offset,
                    "radius": r1,
                    "range_filter": r2,
                    "page_retain_order": True
                }
            }
            res = collection_w.search(
                data=[search_data],
                anns_field=field_name,
                param=param,
                offset=offset,
                page_retain_order=True,
                limit=total_limit//pages,
            )

search by one time

        res = collection_w.search(
            data=[search_data],
            anns_field=field_name,
            param={
                "params": {
                    "level": 1,
                    "offset": 0,
                    "radius": r1,
                    "range_filter": r2,
                    "page_retain_order": True
                }
            },
            limit=total_limit,
        )

The results of the two searches are consistent.

@Yuhanlah
Copy link

Yuhanlah commented Nov 26, 2024

image
image
Hi,
I found the results are not same with the same query in NodeJS. It will also cause duplicate results in two pages.
`
const pageOffset = (page - 1) * pageSize

let searchVectorQuery: SearchSimpleReq = {
  collection_name: `${COLLECTION_PREFIX}${name}`,
  vector: singleVector[0],
  filter: convertedFilter !== '' ? convertedFilter : undefined,
  limit: pageSize,
  offset: pageOffset,
  output_fields: ['*'],
}
const vectorResult = await this.getClient().search(searchVectorQuery)

`

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 27, 2024

@Yuhanlah Hi there, the page_retain_order only applies when you also set radius and offset, i.e., doing range search with offset. May I ask what is your similarity metric? You could set it to the furthest similarity if do not know the upper bound as a workaround.

sre-ci-robot pushed a commit to milvus-io/milvus-sdk-go that referenced this issue Nov 27, 2024
issue: milvus-io/milvus#35464

Signed-off-by: Patrick Weizhi Xu <[email protected]>
(cherry picked from commit a1f6fff)

Signed-off-by: Patrick Weizhi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

4 participants