[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error. #36986

zhuwenxing · 2024-10-18T06:30:38Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version:20750c0-dev
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc96
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-10-18 14:24:32 - INFO - ci_test]: ################################################################################ (conftest.py:232)
[2024-10-18 14:24:32 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:233)
[2024-10-18 14:24:32 - INFO - ci_test]: [setup_class] Start setup class... (client_base.py:40)
[2024-10-18 14:24:32 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:46)
[2024-10-18 14:24:32 - INFO - ci_test]: pymilvus version: 2.5.0rc96 (client_base.py:47)
[2024-10-18 14:24:32 - INFO - ci_test]: [setup_method] Start setup test case test_insert_for_full_text_search_enable_dynamic_field. (client_base.py:49)
-------------------------------- live log call ---------------------------------
[2024-10-18 14:24:32 - INFO - ci_test]: server version: 20750c0-dev (client_base.py:165)
[2024-10-18 14:24:34 - ERROR - pymilvus.decorators]: RPC error: [insert_rows], <ParamError: (code=The data fields number is not match with schema., message=)>, <Time:{'RPC start': '2024-10-18 14:24:33.552853', 'RPC error': '2024-10-18 14:24:34.595584'}> (decorators.py:140)
[2024-10-18 14:24:34 - ERROR - ci_test]: Traceback (most recent call last):
  File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
    res = func(*args, **_kwargs)
  File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 63, in api_request
    return func(*arg, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 507, in insert
    return conn.insert_rows(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 493, in insert_rows
    request = self._prepare_row_insert_request(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 519, in _prepare_row_insert_request
    return Prepare.row_insert_param(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 587, in row_insert_param
    return cls._parse_row_request(request, fields_info, enable_dynamic, entities)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 481, in _parse_row_request
    raise ParamError(ExceptionsMessage.FieldsNumInconsistent)
pymilvus.exceptions.ParamError: <ParamError: (code=The data fields number is not match with schema., message=)>
 (api_request.py:45)
[2024-10-18 14:24:34 - ERROR - ci_test]: (api_response) : <ParamError: (code=The data fields number is not match with schema., message=)> (api_request.py:46)
FAILED
testcases/test_full_text_search.py:518 (TestInsertWithFullTextSearch.test_insert_for_full_text_search_enable_dynamic_field[default-en-False-True])
self = <test_full_text_search.TestInsertWithFullTextSearch object at 0x12e371be0>
tokenizer = 'default', text_lang = 'en', nullable = False
enable_dynamic_field = True

    @pytest.mark.tags(CaseLabel.L0)
    @pytest.mark.parametrize("enable_dynamic_field", [True])
    @pytest.mark.parametrize("nullable", [False])
    @pytest.mark.parametrize("text_lang", ["en"])
    @pytest.mark.parametrize("tokenizer", ["default"])
    def test_insert_for_full_text_search_enable_dynamic_field(self, tokenizer, text_lang, nullable, enable_dynamic_field):
        """
        target: test full text search
        method: 1. enable full text search and insert data with varchar
                2. search with text
                3. verify the result
        expected: full text search successfully and result is correct
        """
        tokenizer_params = {
            "tokenizer": tokenizer,
        }
        dim = 128
        fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
            FieldSchema(
                name="word",
                dtype=DataType.VARCHAR,
                max_length=65535,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
                is_partition_key=True,
            ),
            FieldSchema(
                name="sentence",
                dtype=DataType.VARCHAR,
                max_length=65535,
                nullable=nullable,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
            ),
            FieldSchema(
                name="paragraph",
                dtype=DataType.VARCHAR,
                max_length=65535,
                nullable=nullable,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
            ),
            FieldSchema(
                name="text",
                dtype=DataType.VARCHAR,
                max_length=65535,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
            ),
            FieldSchema(name="emb", dtype=DataType.FLOAT_VECTOR, dim=dim),
            FieldSchema(name="text_sparse_emb", dtype=DataType.SPARSE_FLOAT_VECTOR),
        ]
        schema = CollectionSchema(fields=fields, description="test collection", enable_dynamic_field=enable_dynamic_field)
        bm25_function = Function(
            name="text_bm25_emb",
            function_type=FunctionType.BM25,
            input_field_names=["text"],
            output_field_names=["text_sparse_emb"],
            params={},
        )
        schema.add_function(bm25_function)
        data_size = 5000
        collection_w = self.init_collection_wrap(
            name=cf.gen_unique_str(prefix), schema=schema
        )
        fake = fake_en
        if text_lang == "zh":
            fake = fake_zh
        elif text_lang == "de":
            fake = Faker("de_DE")
        elif text_lang == "hybrid":
            fake = Faker()
    
        if nullable:
            data = [
                {
                    "id": i,
                    "word": fake.word().lower(),
                    "sentence": fake.sentence().lower() if random.random() < 0.5 else None,
                    "paragraph": fake.paragraph().lower() if random.random() < 0.5 else None,
                    "text": fake.text().lower(),  # function input should not be None
                    "emb": [random.random() for _ in range(dim)],
                    f"dynamic_field_{i}": f"dynamic_value_{i}"
                }
                for i in range(data_size)
            ]
        else:
            data = [
                {
                    "id": i,
                    "word": fake.word().lower(),
                    "sentence": fake.sentence().lower(),
                    "paragraph": fake.paragraph().lower(),
                    "text": fake.text().lower(),
                    "emb": [random.random() for _ in range(dim)],
                    f"dynamic_field_{i}": f"dynamic_value_{i}"
                }
                for i in range(data_size)
            ]
        if text_lang == "hybrid":
            hybrid_data = []
            for i in range(data_size):
                fake = random.choice([fake_en, fake_zh, Faker("de_DE")])
                tmp = {
                    "id": i,
                    "word": fake.word().lower(),
                    "sentence": fake.sentence().lower(),
                    "paragraph": fake.paragraph().lower(),
                    "text": fake.text().lower(),
                    "emb": [random.random() for _ in range(dim)],
                    f"dynamic_field_{i}": f"dynamic_value_{i}"
                }
                hybrid_data.append(tmp)
            data = hybrid_data + data
        # df = pd.DataFrame(data)
        # log.info(f"dataframe\n{df}")
        batch_size = 5000
        for i in range(0, len(data), batch_size):
>           collection_w.insert(
                data[i: i + batch_size]
                if i + batch_size < len(data)
                else data[i: len(data)]
            )

test_full_text_search.py:638:

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

It should be a problem with pymilvus's check. The error is thrown by pymilvus, not the server.

The text was updated successfully, but these errors were encountered:

zhuwenxing · 2024-10-18T06:31:29Z

/assign @zhengbuqian
PTAL

… the schema (#2303) issue: milvus-io/milvus#36986 Signed-off-by: Buqian Zheng <[email protected]>

zhengbuqian · 2024-10-21T03:02:30Z

milvus-io/pymilvus#2303 has been merged, try pymilvus-2.5.0rc99.

/assign @zhuwenxing
/unassign

zhuwenxing · 2024-10-21T09:47:53Z

verified and fixed in 2.5.0rc101

zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 18, 2024

zhuwenxing assigned yanliang567 Oct 18, 2024

zhuwenxing added the feature/full text search label Oct 18, 2024

sre-ci-robot assigned zhengbuqian Oct 18, 2024

zhuwenxing added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. labels Oct 18, 2024

zhengbuqian added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 18, 2024

zhengbuqian added this to the 2.5.0 milestone Oct 18, 2024

zhengbuqian unassigned yanliang567 Oct 18, 2024

zhengbuqian mentioned this issue Oct 18, 2024

fix: simplified the logic to check if the insert/request data matches the schema milvus-io/pymilvus#2303

Merged

sre-ci-robot pushed a commit to milvus-io/pymilvus that referenced this issue Oct 21, 2024

fix: simplified the logic to check if the insert/request data matches…

bb69c1d

… the schema (#2303) issue: milvus-io/milvus#36986 Signed-off-by: Buqian Zheng <[email protected]>

sre-ci-robot assigned zhuwenxing and unassigned zhengbuqian Oct 21, 2024

zhuwenxing closed this as completed Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error. #36986

[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error. #36986

zhuwenxing commented Oct 18, 2024

zhuwenxing commented Oct 18, 2024

zhengbuqian commented Oct 21, 2024

zhuwenxing commented Oct 21, 2024

[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error. #36986

[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error. #36986

Comments

zhuwenxing commented Oct 18, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

zhuwenxing commented Oct 18, 2024

zhengbuqian commented Oct 21, 2024

zhuwenxing commented Oct 21, 2024