Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Geospatial Data Type and GIS Function Support for milvus #37417

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tasty-gumi
Copy link
Contributor

issue:#27576
pr:#35990

Main Goals

  1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields.
  2. Insert geospatial data as payload values in the insert binlog, and print the values for verification.
  3. Load segments containing geospatial data into memory.
  4. Ensure query outputs can display geospatial data.
  5. Support filtering on GIS functions for geospatial columns.

Solution

  1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces.
  2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file.
  3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization.
  4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management.
  5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.
  6. Index Construction: Consider building an H3 index, utilizing the C interface provided by the H3 system.
  7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus.

delete incomplete H3 Index development and useless generated files.
fix conanfiles in milvus conan repo so that local can fetch the packages to build libraries

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tasty-gumi
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. area/dependency Pull requests that update a dependency file area/test sig/testing test/integration integration test labels Nov 4, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Nov 4, 2024
Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

rerun go-sdk

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

rerun cpp-unit-test

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 8, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 8, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 8, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 9, 2024

@tasty-gumi
pytest : test] pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=create auto index on type:JSON is not supported)>

[pytest : test] (api_request.py:45)

[pytest : test] [2024-11-08 14:24:30 - ERROR - ci_test]: (api_response) : <MilvusException: (code=65535, message=create auto index on type:JSON is not supported)> (api_request.py:46)

[pytest : test] ---------- generated html file: file:///tmp/ci_logs/test/report.html -----------

[pytest : test] =========================== short test summary info ============================

[pytest : test] FAILED testcases/test_index.py::TestIndexInvalid::test_create_index_json

[pytest : test] !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!

Copy link
Contributor

mergify bot commented Nov 11, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 12, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 12, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 13, 2024

@tasty-gumi Intergration Test failed.

2024-11-12T13:06:50.0930073Z [2024/11/12 13:02:43.107 +00:00] [INFO] [querynodev2/services.go:483] ["start to load segments in parallel"] [collectionID=453880325253365774] [segmentType=Sealed] [requestSegments="[453880325253771188]"] [preparedSegments="[453880325253771188]"] [segmentNum=1] [concurrencyLevel=1]
2024-11-12T13:06:50.0931913Z [2024/11/12 13:02:43.107 +00:00] [WARN] [querynodev2/services.go:461] ["worker failed to load segments"] [collectionID=453880325253365774] [channel=by-dev-rootcoord-dml_0_453880325253365774v0] [replicaID=453880325350359041] [workID=1] [segments="[453880325253769583]"] [error="At LoadSegment: => unsupported data type at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:442\n"]
2024-11-12T13:06:50.0933010Z [2024/11/12 13:02:43.108 +00:00] [INFO] [funcutil/parallel.go:86] ["load segment..."] [collectionID=453880325253365774] [segmentType=Sealed] [requestSegments="[453880325253771188]"] [preparedSegments="[453880325253771188]"] [partitionID=453880325253365775] [segmentID=453880325253771188] [segmentType=L1]
2024-11-12T13:06:50.0935035Z [2024/11/12 13:02:43.108 +00:00] [WARN] [querynode/service.go:301] ["delegator failed to load segments"] [collectionID=453880325253365774] [partitionID=453880325253365775] [shard=by-dev-rootcoord-dml_0_453880325253365774v0] [segmentID=453880325253769583] [level=L1] [currentNodeID=1] [dstNodeID=1] [error="At LoadSegment: => unsupported data type at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:442\n"]
2024-11-12T13:06:50.0936237Z [2024/11/12 13:02:43.108 +00:00] [INFO] [segments/segment_loader.go:334] ["start loading segment files"] [collectionID=453880325253365774] [partitionID=453880325253365775] [shard=by-dev-rootcoord-dml_0_453880325253365774v0] [segmentID=453880325253771188] [rowNum=3000] [segmentType=Sealed]
2024-11-12T13:06:50.0938012Z [2024/11/12 13:02:43.108 +00:00] [WARN] [task/executor.go:160] ["failed to load segment"] [taskID=1731416037011] [collectionID=453880325253365774] [replicaID=453880325350359041] [segmentID=453880325253769583] [node=1] [source=segment_checker] [shardLeader=1] [error="At LoadSegment: => unsupported data type at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:442\n"]
2024-11-12T13:06:50.0939242Z [2024/11/12 13:02:43.109 +00:00] [INFO] [task/executor.go:142] ["execute ac

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@tasty-gumi
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 13, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

add geospatial interface in src common

change type define and add segcore support

add storage & chunkdata support

feature: go package storage & proxy & typeutil support geospatial type in internal and typeutil in pkg

Signed-off-by: tasty-gumi <[email protected]>

add geospatial interface in src common

change type define and add segcore support

change: use wkb only in core

Signed-off-by: tasty-gumi <[email protected]>

fix:the geospatial only use std::string as FieldDataImpl template paramters && add geospatial data generation && pass chunk ,growing , sealed test

fix : merge confilcts after rebase ,test nullable not pass due to upstream

feat:basic GIS Function expr and visitor impl and GIS proto support && add:storage test of geo data

Signed-off-by: tasty-gumi <[email protected]>

feat:add proxy validate (pass httpserver test) && plan parser of geospatialfunction

fix:sealedseg && go tidy

fix:go mod

feat:can produce wkt result for pymilvus client

feat: add parser and query operator for geos filed && print geos binlog as wkt

fix:fielddataimpl interface
Signed-off-by: tasty-gumi <[email protected]>

fix: some format of code && segmentfault debug for rebase

Signed-off-by: tasty-gumi <[email protected]>

add: import util test for parquet and mix compaction test

Signed-off-by: tasty-gumi <[email protected]>

fix: delete useless file and fix error for rebase

Signed-off-by: tasty-gumi <[email protected]>

fix: git rebase for custom function feat

Signed-off-by: tasty-gumi <[email protected]>

fix:rename geospatial field && update proto && rewrite Geometry class with smart pointer

Signed-off-by: tasty-gumi <[email protected]>

add:last commit miss add files

Signed-off-by: tasty-gumi <[email protected]>

fix: geospatial name replace in test files && fix geomertry and parser

fix:remove some file change for dev

Signed-off-by: tasty-gumi <[email protected]>

fix:remove size in if && add destory in ~Geometry()

Signed-off-by: tasty-gumi <[email protected]>

add:conan file gdal rep

Signed-off-by: tasty-gumi <[email protected]>

remove:gdal fPIC

Signed-off-by: tasty-gumi <[email protected]>

fix: for rebase

Signed-off-by: tasty-gumi <[email protected]>

remove:log_warn

Signed-off-by: tasty-gumi <[email protected]>

remove:gdal shared

Signed-off-by: tasty-gumi <[email protected]>

remove:tbbproxy

Signed-off-by: tasty-gumi <[email protected]>

fix:add gdal option && update go mod

Signed-off-by: tasty-gumi <[email protected]>

dev:change some scripts

Signed-off-by: tasty-gumi <[email protected]>

remove: dev scripts

Signed-off-by: tasty-gumi <[email protected]>

add:conan files dependency of gdal

Signed-off-by: tasty-gumi <[email protected]>

fix:fmt cpp code

Signed-off-by: tasty-gumi <[email protected]>

add:delete geos-config in cmake_bulid/bin which may cause permission deny

Signed-off-by: tasty-gumi <[email protected]>

fix: add go client geometry interface && fix group by test

Signed-off-by: tasty-gumi <[email protected]>

fix: mod tidy for tests go client

Signed-off-by: tasty-gumi <[email protected]>

fix:memory leak in test and go fmt

Signed-off-by: tasty-gumi <[email protected]>

fix: datagen function remove pkoffset

Signed-off-by: tasty-gumi <[email protected]>

fix: go-client test add entity.geometry

Signed-off-by: tasty-gumi <[email protected]>

fix: fix test args and add some annotations

Signed-off-by: tasty-gumi <[email protected]>

fix:name and remove wkt marshl MaxDecimalDigits limit

Signed-off-by: tasty-gumi <[email protected]>

fix:misspell

Signed-off-by: tasty-gumi <[email protected]>

fix:go client test

Signed-off-by: tasty-gumi <[email protected]>

fix:listA size

Signed-off-by: tasty-gumi <[email protected]>

add:field data in schema_test

Signed-off-by: tasty-gumi <[email protected]>

test:add mergefield data

Signed-off-by: tasty-gumi <[email protected]>

fix:test err code modify

Signed-off-by: tasty-gumi <[email protected]>

fmt code

Signed-off-by: tasty-gumi <[email protected]>

fix:add geo  type in client

Signed-off-by: tasty-gumi <[email protected]>

fix:add type in chunksegment sealdimpl

Signed-off-by: tasty-gumi <[email protected]>

fix:add chunk writer for geometry

Signed-off-by: tasty-gumi <[email protected]>
Signed-off-by: tasty-gumi <[email protected]>
Signed-off-by: tasty-gumi <[email protected]>
Copy link
Contributor

mergify bot commented Dec 7, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 7, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Pull requests that update a dependency file area/test dco-passed DCO check passed. kind/feature Issues related to feature request from users sig/testing size/XXL Denotes a PR that changes 1000+ lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants