Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: metric type not match: invalid [expected=][actual=IP]: invalid parameter #34422

Closed
1 task done
cuonglp1713 opened this issue Jul 4, 2024 · 15 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@cuonglp1713
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4.4
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.4
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I can not use search function of milvus_client because of metric_type. It's not about metric type not match. It is not input expected metric type. Therefore, it's always raise the error metric type not match: invalid [expected=][actual=IP]: invalid parameter. Also I have specify metric_type in milvus_client.create_collection

Expected Behavior

Show metric_type of expected

Steps To Reproduce

No response

Milvus Log

RPC error: [search], <MilvusException: (code=1100, message=fail to search: metric type not match: invalid [expected=][actual=IP]: invalid parameter)>, <Time:{'RPC start': '2024-07-04 11:17:30.777485', 'RPC error': '2024-07-04 11:17:30.778573'}>

Anything else?

No response

@cuonglp1713 cuonglp1713 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 4, 2024
@yanliang567
Copy link
Contributor

@cuonglp1713 could you please share the code snippet to reproduce the issue? My guess is that you should specify the same metric type when building index and searching

/assign @cuonglp1713

@cuonglp1713
Copy link
Author

@yanliang567 I set same metric_type for both sure. Here is my code:

if milvus_client.has_collection(MILVUS_COLLECTION):
    milvus_client.drop_collection(MILVUS_COLLECTION)

milvus_client.create_collection(
    collection_name=MILVUS_COLLECTION,
    schema=schema, 
    metric_type='IP',
    consitency_level='strong'
)

milvus_client.search(
    collection_name=MILVUS_COLLECTION,
    data=[transformer.encode(question)],
    limit=5,
    search_params={"metric_type": "IP", "params": {}},
    output_fields=["text"]
)

Btw, even if I forgot set metric_type when creating collection, I expected seeing it in my error logs

@yanliang567
Copy link
Contributor

  1. how you define your collection schema
  2. if you don't sepcify metric type, milvus use the COSINE as the default metric type

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 4, 2024
@cuonglp1713
Copy link
Author

cuonglp1713 commented Jul 4, 2024

  1. I define as below:
schema.add_field(
    schema=schema,
    field_name='id',
    datatype=DataType.VARCHAR,
    is_primary=True,
    max_length=60000
)

schema.add_field(
    schema=schema,
    field_name='vector',
    datatype=DataType.FLOAT_VECTOR,
    dim=VECTOR_DIM
)

schema.add_field(
    schema=schema,
    field_name='text',
    datatype=DataType.VARCHAR,
    max_length=60000
)
  1. Like I said before, I set my collection metric type is IP, and I had switch to COSINE in search too but raise same problem

@yanliang567
Copy link
Contributor

When I ran the code snippet as you suggested above, I got error msg about "collection not loaded" or "index not found", which I think could be expected. So did you manually build index and load the collection? could you share a completed reproduce code snippet, I am asking that because there are 2 mode for create collection in milvus client sdk. please refer to https://milvus.io/docs/manage-collections.md for details. @cuonglp1713

@cuonglp1713
Copy link
Author

@yanliang567 Sorry beause I just share ideas to prove that I had set metric type for both create and search collection. If you would like to ran code snippet, here is my code:

MILVUS_URL = "http://localhost:19530"
MILVUS_HOST = 'localhost'
MILVUS_PORT = '19530'
MILVUS_COLLECTION = 'tvpl_collection'
EMBEDDING_MODEL_NAME = 'BAAI/bge-m3'
VECTOR_DIM = 1024

import pandas as pd
from pymilvus import MilvusClient, DataType

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Document

from sentence_transformers import SentenceTransformer


milvus_client = MilvusClient(uri=MILVUS_HOST)
schema = milvus_client.create_schema(auto_id=False, enable_dynamic_field=False)

# process sample data
# create sample data to insert
df = pd.DataFrame({
    '_id': [1, 2, 3, 4, 5],
    'len_text': [1, 1, 1, 1, 1],
    'text': ['a', 'b', 'c', 'd', 'e'],
})

contents = []
for i, row in df.iterrows():
    processed_doc = Document(
        text=row['text'],
        metadata={
            'id': row['_id'],
            'len_text': row['len_text']
        }
    )
    contents.append(processed_doc)

# split to chunk
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)
processed_documents = splitter(contents)

# define schema field
schema.add_field(
    schema=schema,
    field_name='id',
    datatype=DataType.VARCHAR,
    is_primary=True,
    max_length=60000
)
schema.add_field(
    schema=schema,
    field_name='vector',
    datatype=DataType.FLOAT_VECTOR,
    dim=VECTOR_DIM
)
schema.add_field(
    schema=schema,
    field_name='text',
    datatype=DataType.VARCHAR,
    max_length=60000
)

# create collection
if milvus_client.has_collection(MILVUS_COLLECTION):
    milvus_client.drop_collection(MILVUS_COLLECTION)

milvus_client.create_collection(
    collection_name=MILVUS_COLLECTION,
    schema=schema, 
    metric_type='IP',
    consitency_level='strong'
)

transformer = SentenceTransformer(EMBEDDING_MODEL_NAME)
data = []
for node in processed_documents:#processed_semantic_documents:
    data.append({
        'id': node.id_,
        'text': node.get_content(),
        'vector': transformer.encode(node.get_content())
    })

# insert to Milvus
res = milvus_client.insert(collection_name=MILVUS_COLLECTION, data=data)

# get top k similar
question = 'a'
milvus_client.search(
    collection_name=MILVUS_COLLECTION,
    data=[transformer.encode(question)],
    limit=5,
    search_params={"metric_type": "IP", "params": {}},
    output_fields=["text"]
)

@yanliang567
Copy link
Contributor

so you did not manually create index and load the collection before searching? could you please try to describe the index?

@cuonglp1713
Copy link
Author

I follow to this instruction: https://milvus.io/docs/single-vector-search.md. I search right after insert to Milvus

@Tegala
Copy link

Tegala commented Jul 5, 2024

@yanliang567
Maybe you can try this code, same error
version:milvus-gpu-2.4.5

import random
import pymilvus
import numpy as np
from tqdm import tqdm
from pymilvus import utility
from pymilvus import Collection
from pymilvus import connections, db
from pymilvus import CollectionSchema, FieldSchema, DataType


def build_schema(dim=768):
    uid = FieldSchema(name='id', dtype=DataType.VARCHAR, max_length=128, is_primary=True)
    label = FieldSchema(name='label', dtype=DataType.VARCHAR, max_length=64)
    vector = FieldSchema(name='vector', dtype=DataType.FLOAT_VECTOR, dim=dim)
    schema = CollectionSchema(
        fields=[uid, label, vector],
        enable_dynamic_field=False
    )
    return schema

def build_collection(collection_name, dim=768, **kwargs):
    if utility.has_collection(collection_name):
        return Collection(collection_name)
    
    schema = build_schema(dim)
    collection = Collection(
        name=collection_name,
        schema=schema,
        **kwargs
        )    
    return collection

def load_collection(collection, partition_names=None, index_kwargs={}):
    index_params = {
      "metric_type": index_kwargs.get('metric_type', "COSINE"),
      "index_type": index_kwargs.get('index_type', "GPU_BRUTE_FORCE"), # GPU_BRUTE_FORCE
      "params": index_kwargs.get('params', {})
    }
    collection.create_index(
        field_name="vector", 
        index_params=index_params
    )

    collection.load(partition_names=partition_names)
    #utility.index_building_progress(collection.name)    


def upsert_collection(dataloader, collection_name='super_model', partition_name='default'):
    collection = build_collection(collection_name)
    if not collection.has_partition(partition_name):
        collection.create_partition(partition_name)
        
    desc = 'Upserting Collection=>%s, Partition=>%s' % (collection_name, partition_name)
    for batch in tqdm(dataloader, desc=desc):
        collection.upsert(batch, partition_name=partition_name)
    collection.flush()
    
def dataloader(num, batch_size=1024, dim=768):
    for i in range(0, num, batch_size):
        j = min(i + batch_size, num)
        ids, labels, vectors = [], [], []
        for k in range(i, j):
            ids.append('1%06d' % k)
            labels.append('yl_sku_label_%03d' % random.choice(range(1000)))
            vectors.append(list(np.random.randn(dim).astype(np.float32)))
        yield [ids, labels, vectors]
            
    
uri="http://172.16.74.199:19530"    
connections.connect(uri=uri)
collection = build_collection('demo', dim=768)
collection.create_partition('part1')
upsert_collection(dataloader(10000, batch_size=1000), 'demo', 'part1')

load_collection(collection, partition_names=None)#['trained'])

@yanliang567
Copy link
Contributor

I follow to this instruction: https://milvus.io/docs/single-vector-search.md. I search right after insert to Milvus

I don't think you are doing the same the single-vector-search.md: you customized a schema when creating collection, which leads you have to manually create index and load. please try to milvus_client.describe_index(collection_name, vector_field_name) to check the index params

@yanliang567
Copy link
Contributor

https://milvus.io/docs/manage-collections.md please try to read this doc to understand the difference between 2 creation mode

@cuonglp1713
Copy link
Author

I will. Thank you for your support. Im close this issue now.

@cuonglp1713
Copy link
Author

Helpful !!

@Mint-hfut
Copy link

How did you solve it?

@albertollamaso
Copy link

I was having a similar issue where I was looking to filter by date (epoch). This is how I ended doing it:

        field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
        field2 = FieldSchema(name="vector", dtype=DataType.SPARSE_FLOAT_VECTOR)
        field3 = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535, enable_analyzer=True)
        field4 = FieldSchema(name="date", dtype=DataType.INT64)
        field5 = FieldSchema(name="host", dtype=DataType.VARCHAR, max_length=65535)

        schema = CollectionSchema(fields=[field1, field2, field3, field4, field5])

    	# Get Datadog alerts from the last 12 hours
    	current_time = datetime.utcnow()
    	time_12_hours_ago = current_time - timedelta(hours=12)
    	epoch_12_hours_ago = int(time_12_hours_ago.timestamp())
		expression = f"alert_triggered_on >= {epoch_12_hours_ago}"

        milvus_client.query(
            collection_name=collection_name,
            filter=expression,
            limit=10,
            output_fields=["text"]
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants