[Bug]: metric type not match: invalid [expected=][actual=IP]: invalid parameter #34422

cuonglp1713 · 2024-07-04T10:14:09Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: 2.4.4
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.4
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I can not use search function of milvus_client because of metric_type. It's not about metric type not match. It is not input expected metric type. Therefore, it's always raise the error metric type not match: invalid [expected=][actual=IP]: invalid parameter. Also I have specify metric_type in milvus_client.create_collection

Expected Behavior

Show metric_type of expected

Steps To Reproduce

No response

Milvus Log

RPC error: [search], <MilvusException: (code=1100, message=fail to search: metric type not match: invalid [expected=][actual=IP]: invalid parameter)>, <Time:{'RPC start': '2024-07-04 11:17:30.777485', 'RPC error': '2024-07-04 11:17:30.778573'}>

Anything else?

No response

yanliang567 · 2024-07-04T10:59:30Z

@cuonglp1713 could you please share the code snippet to reproduce the issue? My guess is that you should specify the same metric type when building index and searching

/assign @cuonglp1713

cuonglp1713 · 2024-07-04T11:05:09Z

@yanliang567 I set same metric_type for both sure. Here is my code:

if milvus_client.has_collection(MILVUS_COLLECTION):
    milvus_client.drop_collection(MILVUS_COLLECTION)

milvus_client.create_collection(
    collection_name=MILVUS_COLLECTION,
    schema=schema, 
    metric_type='IP',
    consitency_level='strong'
)

milvus_client.search(
    collection_name=MILVUS_COLLECTION,
    data=[transformer.encode(question)],
    limit=5,
    search_params={"metric_type": "IP", "params": {}},
    output_fields=["text"]
)

Btw, even if I forgot set metric_type when creating collection, I expected seeing it in my error logs

yanliang567 · 2024-07-04T11:30:43Z

how you define your collection schema
if you don't sepcify metric type, milvus use the COSINE as the default metric type

cuonglp1713 · 2024-07-04T13:22:43Z

I define as below:

schema.add_field(
    schema=schema,
    field_name='id',
    datatype=DataType.VARCHAR,
    is_primary=True,
    max_length=60000
)

schema.add_field(
    schema=schema,
    field_name='vector',
    datatype=DataType.FLOAT_VECTOR,
    dim=VECTOR_DIM
)

schema.add_field(
    schema=schema,
    field_name='text',
    datatype=DataType.VARCHAR,
    max_length=60000
)

Like I said before, I set my collection metric type is IP, and I had switch to COSINE in search too but raise same problem

yanliang567 · 2024-07-05T01:23:28Z

When I ran the code snippet as you suggested above, I got error msg about "collection not loaded" or "index not found", which I think could be expected. So did you manually build index and load the collection? could you share a completed reproduce code snippet, I am asking that because there are 2 mode for create collection in milvus client sdk. please refer to https://milvus.io/docs/manage-collections.md for details. @cuonglp1713

cuonglp1713 · 2024-07-05T01:44:33Z

@yanliang567 Sorry beause I just share ideas to prove that I had set metric type for both create and search collection. If you would like to ran code snippet, here is my code:

MILVUS_URL = "http://localhost:19530"
MILVUS_HOST = 'localhost'
MILVUS_PORT = '19530'
MILVUS_COLLECTION = 'tvpl_collection'
EMBEDDING_MODEL_NAME = 'BAAI/bge-m3'
VECTOR_DIM = 1024

import pandas as pd
from pymilvus import MilvusClient, DataType

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Document

from sentence_transformers import SentenceTransformer


milvus_client = MilvusClient(uri=MILVUS_HOST)
schema = milvus_client.create_schema(auto_id=False, enable_dynamic_field=False)

# process sample data
# create sample data to insert
df = pd.DataFrame({
    '_id': [1, 2, 3, 4, 5],
    'len_text': [1, 1, 1, 1, 1],
    'text': ['a', 'b', 'c', 'd', 'e'],
})

contents = []
for i, row in df.iterrows():
    processed_doc = Document(
        text=row['text'],
        metadata={
            'id': row['_id'],
            'len_text': row['len_text']
        }
    )
    contents.append(processed_doc)

# split to chunk
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)
processed_documents = splitter(contents)

# define schema field
schema.add_field(
    schema=schema,
    field_name='id',
    datatype=DataType.VARCHAR,
    is_primary=True,
    max_length=60000
)
schema.add_field(
    schema=schema,
    field_name='vector',
    datatype=DataType.FLOAT_VECTOR,
    dim=VECTOR_DIM
)
schema.add_field(
    schema=schema,
    field_name='text',
    datatype=DataType.VARCHAR,
    max_length=60000
)

# create collection
if milvus_client.has_collection(MILVUS_COLLECTION):
    milvus_client.drop_collection(MILVUS_COLLECTION)

milvus_client.create_collection(
    collection_name=MILVUS_COLLECTION,
    schema=schema, 
    metric_type='IP',
    consitency_level='strong'
)

transformer = SentenceTransformer(EMBEDDING_MODEL_NAME)
data = []
for node in processed_documents:#processed_semantic_documents:
    data.append({
        'id': node.id_,
        'text': node.get_content(),
        'vector': transformer.encode(node.get_content())
    })

# insert to Milvus
res = milvus_client.insert(collection_name=MILVUS_COLLECTION, data=data)

# get top k similar
question = 'a'
milvus_client.search(
    collection_name=MILVUS_COLLECTION,
    data=[transformer.encode(question)],
    limit=5,
    search_params={"metric_type": "IP", "params": {}},
    output_fields=["text"]
)

yanliang567 · 2024-07-05T02:20:52Z

so you did not manually create index and load the collection before searching? could you please try to describe the index?

cuonglp1713 · 2024-07-05T02:25:58Z

I follow to this instruction: https://milvus.io/docs/single-vector-search.md. I search right after insert to Milvus

Tegala · 2024-07-05T02:34:02Z

@yanliang567
Maybe you can try this code, same error
version：milvus-gpu-2.4.5

import random
import pymilvus
import numpy as np
from tqdm import tqdm
from pymilvus import utility
from pymilvus import Collection
from pymilvus import connections, db
from pymilvus import CollectionSchema, FieldSchema, DataType


def build_schema(dim=768):
    uid = FieldSchema(name='id', dtype=DataType.VARCHAR, max_length=128, is_primary=True)
    label = FieldSchema(name='label', dtype=DataType.VARCHAR, max_length=64)
    vector = FieldSchema(name='vector', dtype=DataType.FLOAT_VECTOR, dim=dim)
    schema = CollectionSchema(
        fields=[uid, label, vector],
        enable_dynamic_field=False
    )
    return schema

def build_collection(collection_name, dim=768, **kwargs):
    if utility.has_collection(collection_name):
        return Collection(collection_name)
    
    schema = build_schema(dim)
    collection = Collection(
        name=collection_name,
        schema=schema,
        **kwargs
        )    
    return collection

def load_collection(collection, partition_names=None, index_kwargs={}):
    index_params = {
      "metric_type": index_kwargs.get('metric_type', "COSINE"),
      "index_type": index_kwargs.get('index_type', "GPU_BRUTE_FORCE"), # GPU_BRUTE_FORCE
      "params": index_kwargs.get('params', {})
    }
    collection.create_index(
        field_name="vector", 
        index_params=index_params
    )

    collection.load(partition_names=partition_names)
    #utility.index_building_progress(collection.name)    


def upsert_collection(dataloader, collection_name='super_model', partition_name='default'):
    collection = build_collection(collection_name)
    if not collection.has_partition(partition_name):
        collection.create_partition(partition_name)
        
    desc = 'Upserting Collection=>%s, Partition=>%s' % (collection_name, partition_name)
    for batch in tqdm(dataloader, desc=desc):
        collection.upsert(batch, partition_name=partition_name)
    collection.flush()
    
def dataloader(num, batch_size=1024, dim=768):
    for i in range(0, num, batch_size):
        j = min(i + batch_size, num)
        ids, labels, vectors = [], [], []
        for k in range(i, j):
            ids.append('1%06d' % k)
            labels.append('yl_sku_label_%03d' % random.choice(range(1000)))
            vectors.append(list(np.random.randn(dim).astype(np.float32)))
        yield [ids, labels, vectors]
            
    
uri="http://172.16.74.199:19530"    
connections.connect(uri=uri)
collection = build_collection('demo', dim=768)
collection.create_partition('part1')
upsert_collection(dataloader(10000, batch_size=1000), 'demo', 'part1')

load_collection(collection, partition_names=None)#['trained'])

yanliang567 · 2024-07-05T02:53:43Z

I follow to this instruction: https://milvus.io/docs/single-vector-search.md. I search right after insert to Milvus

I don't think you are doing the same the single-vector-search.md: you customized a schema when creating collection, which leads you have to manually create index and load. please try to milvus_client.describe_index(collection_name, vector_field_name) to check the index params

yanliang567 · 2024-07-05T02:54:21Z

https://milvus.io/docs/manage-collections.md please try to read this doc to understand the difference between 2 creation mode

cuonglp1713 · 2024-07-05T07:31:40Z

I will. Thank you for your support. Im close this issue now.

cuonglp1713 · 2024-07-05T07:39:36Z

Helpful !!

Mint-hfut · 2024-11-28T10:24:17Z

How did you solve it?

albertollamaso · 2024-12-25T13:57:20Z

I was having a similar issue where I was looking to filter by date (epoch). This is how I ended doing it:

        field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
        field2 = FieldSchema(name="vector", dtype=DataType.SPARSE_FLOAT_VECTOR)
        field3 = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535, enable_analyzer=True)
        field4 = FieldSchema(name="date", dtype=DataType.INT64)
        field5 = FieldSchema(name="host", dtype=DataType.VARCHAR, max_length=65535)

        schema = CollectionSchema(fields=[field1, field2, field3, field4, field5])

    	# Get Datadog alerts from the last 12 hours
    	current_time = datetime.utcnow()
    	time_12_hours_ago = current_time - timedelta(hours=12)
    	epoch_12_hours_ago = int(time_12_hours_ago.timestamp())
		expression = f"alert_triggered_on >= {epoch_12_hours_ago}"

        milvus_client.query(
            collection_name=collection_name,
            filter=expression,
            limit=10,
            output_fields=["text"]
        )

cuonglp1713 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 4, 2024

cuonglp1713 assigned yanliang567 Jul 4, 2024

sre-ci-robot assigned cuonglp1713 Jul 4, 2024

yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 4, 2024

cuonglp1713 closed this as completed Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: metric type not match: invalid [expected=][actual=IP]: invalid parameter #34422

[Bug]: metric type not match: invalid [expected=][actual=IP]: invalid parameter #34422

cuonglp1713 commented Jul 4, 2024

yanliang567 commented Jul 4, 2024

cuonglp1713 commented Jul 4, 2024

yanliang567 commented Jul 4, 2024

cuonglp1713 commented Jul 4, 2024 •

edited

Loading

yanliang567 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

yanliang567 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

Tegala commented Jul 5, 2024

yanliang567 commented Jul 5, 2024

yanliang567 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

Mint-hfut commented Nov 28, 2024

albertollamaso commented Dec 25, 2024

[Bug]: metric type not match: invalid [expected=][actual=IP]: invalid parameter #34422

[Bug]: metric type not match: invalid [expected=][actual=IP]: invalid parameter #34422

Comments

cuonglp1713 commented Jul 4, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Jul 4, 2024

cuonglp1713 commented Jul 4, 2024

yanliang567 commented Jul 4, 2024

cuonglp1713 commented Jul 4, 2024 • edited Loading

yanliang567 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

yanliang567 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

Tegala commented Jul 5, 2024

yanliang567 commented Jul 5, 2024

yanliang567 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

cuonglp1713 commented Jul 5, 2024

Mint-hfut commented Nov 28, 2024

albertollamaso commented Dec 25, 2024

cuonglp1713 commented Jul 4, 2024 •

edited

Loading