[Bug]: GPU_CAGRA omm and slower as GPU_IVF_PQ #38101

yongxin3344520 · 2024-11-29T07:28:10Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: 2.5.0-beta
- Deployment mode(standalone ):
- SDK version(pymilvus 2.5.0):
- OS(Ubuntu20.04): 

CPU:Intel(R)Xeon(R)Platinu [email protected]核
RAM:48G
GFU:4090 24G
硬盘:机械 120g

Current Behavior

I used 5000 to query 6000000, with dimensions of 256 and topk=100.

use GPU_CAGRA index， and omm ， and slower as GPU_IVF_PQ

The configuration file milvus.yaml is not effective

Setting gpu'maxMemSize is 20480 , but use 22144MiB at least

Expected Behavior

Traceback (most recent call last):
File "/soft/test_gpu2.py", line 115, in
results = collection.search(data=[
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/orm/collection.py", line 801, in search
resp = conn.search(
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/decorators.py", line 141, in handler
raise e from e
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/decorators.py", line 137, in handler
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/decorators.py", line 176, in handler
return func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/decorators.py", line 116, in handler
raise e from e
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/decorators.py", line 86, in handler
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py", line 805, in search
return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py", line 746, in _execute_search
raise e from e
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search
check_status(response.status)
File "/root/miniconda3/lib/python3.9/site-packages/pymilvus/client/utils.py", line 63, in check_status
raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=fail to search on QueryNode 3: worker(3) query failed: Operator::GetOutput faide id: 197] : => failed to search: config={{"itopk_size":128,"k":100,"metric_type":"IP","span_id":"8b80dac94bbe6d43","trace_flags":0,"trace_id":"7edcf859 raft inner error: std::bad_alloc: out_of_memory: RMM failure at:/workspace/source/cmake_build/3rdparty_download/rmm-src/include/rmm/mr/device/pool_memorypool size exceeded at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:411

Steps To Reproduce

1. Insert random generate 6000000 * 256 vectors ;
2. Random select from 6000000 * 256 vectors 10000 nums ;
3. Use 10000 * 256 serach in 6000000 * 256 vectors ;

Milvus Log

..............

Anything else?

......

yongxin3344520 · 2024-11-29T07:29:36Z

#This is my python code:

`
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
import random, time
import uuid

if name == 'main':
dim = 256
collection_name = "example_collection00000"
# 连接Milvus服务器
connections.connect(host='127.0.0.1', port='19530')
conn = connections._fetch_handler()
res = conn.list_collections()
for r in res:
print(r)
if r == collection_name:
continue
conn.release_collection(r)
conn.drop_collection(r)

#exit(0)

if not conn.has_collection(collection_name):

    # 定义字段
    field1 = FieldSchema(name="embedding",
                         dtype=DataType.FLOAT_VECTOR,
                         description="float vector field",
                         is_primary=False,
                         dim=dim
                         )
    field2 = FieldSchema(name="id",
                         dtype=DataType.INT64,
                         description="id field",
                         is_primary=True)

    # 定义集合的 schema
    schema = CollectionSchema(fields=[field1, field2], description="collection description")

    # 创建集合
    collection = Collection(name=collection_name, schema=schema)

    # 配置索引参数
    index_params = {
        "metric_type": "IP",
        "index_type": "GPU_CAGRA",
        "params": {
            "intermediate_graph_degree": 64,
            "graph_degree": 32,
            "build_algo": "IVF_PQ",
            "cache_dataset_on_device": "false"
        }
    }
    # 创建索引
    collection.create_index(field_name="embedding", index_params=index_params)
else:
    collection = Collection(name=collection_name)

max_id = 0

# 3. 插入随机生成的向量
ts = time.time()
eps = 6000
for j in range(eps):
    t0 = time.time()
    data = []
    for i in range(1000):
        data.append({
            "id": max_id + 1 + i + j * 1000,
            # "project_id": str(random.randint(0, 1000)),
            # "sponsorProjectId": str(random.randint(0, 1000)),
            # "totalPackageProjectId": str(random.randint(0, 1000)),
            # "projectType": random.choice(["a", "b", "c", "d", "e"]),
            # "url": str(uuid.uuid4()),
            # "sponsorId": str(random.randint(0, 1000)),
            # "produceName": str(uuid.uuid4()),
            # "imageId": str(random.randint(0, 1000)),
            # "submitTime": str(random.randint(0, 10000)),
            "embedding": [random.uniform(-1, 1) for _ in range(dim)]
        })
    t1 = time.time()
    res = collection.insert(
        # partition_name=random.choice(partition_names),
        data=data
    )
    t2 = time.time()
    speed = (j + 1) * len(data) / (t2 - ts)
    eta = (eps - 1 - j) * len(data) / speed
    print(
        f"\rSize:{len(data)}, Generate Cost: {'%.3f' % (t1 - t0)}, Insert Cost: {'%.3f' % (t2 - t1)},  Speed:{'%.3f' % speed}it/s, Eta:{int(eta)}s             ",
        end="")

collection.load()

# 执行查询以获取总数据量
res = collection.query(expr=" ", output_fields=["count(*)"])
total = list(res[0].values())[0]
print(f"The total number of entities in the collection '{collection_name}' is: {total}")

search_params = {
    "metric_type": "IP",
    "params": {
        "itopk_size": 128
    }
}

ids = [i for i in range(total)]
select_ids = random.sample(ids, 10000)

t2 = time.time()
query_vector = collection.query(expr=f"id in [{','.join([str(i) for i in select_ids])}] ", output_fields=["id", "embedding"])

t3 = time.time()

results = collection.search(data=[
    [random.uniform(-1, 1) for _ in range(dim)] for i in range(10000)
], anns_field="embedding", param=search_params, limit=100)
t4 = time.time()
print(f"Success: total:{total} query cost: {t3-t2}, search cost{t4 - t3}")
# print(res)
print(len(res), len(res[0]),
      # res ,
      "\n",
      # json.dumps(res, indent=4)
      )

`

yanliang567 · 2024-11-29T08:59:23Z

/assign @Presburger
/unassign

Presburger · 2024-12-03T11:57:16Z

@yongxin3344520 Hello, you don't need to modify the default initsize and maxsize parameters in milvus.yaml, because the memory pool is only used for query vectors. Setting them too large may prevent the vectors and indexes in the database from loading properly.

yongxin3344520 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 29, 2024

yongxin3344520 assigned yanliang567 Nov 29, 2024

sre-ci-robot assigned Presburger and unassigned yanliang567 Nov 29, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 29, 2024

yanliang567 added this to the 2.5.0 milestone Nov 29, 2024

yanliang567 modified the milestones: 2.5.0, 2.5.1, 2.5.2 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: GPU_CAGRA omm and slower as GPU_IVF_PQ #38101

[Bug]: GPU_CAGRA omm and slower as GPU_IVF_PQ #38101

yongxin3344520 commented Nov 29, 2024 •

edited

Loading

yongxin3344520 commented Nov 29, 2024 •

edited

Loading

yanliang567 commented Nov 29, 2024

Presburger commented Dec 3, 2024

[Bug]: GPU_CAGRA omm and slower as GPU_IVF_PQ #38101

[Bug]: GPU_CAGRA omm and slower as GPU_IVF_PQ #38101

Comments

yongxin3344520 commented Nov 29, 2024 • edited Loading

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yongxin3344520 commented Nov 29, 2024 • edited Loading

yanliang567 commented Nov 29, 2024

Presburger commented Dec 3, 2024

yongxin3344520 commented Nov 29, 2024 •

edited

Loading

yongxin3344520 commented Nov 29, 2024 •

edited

Loading