From 7c63c78aff6b8b382afb4701db77a017078b4c09 Mon Sep 17 00:00:00 2001 From: Milvus-doc-bot Date: Thu, 28 Nov 2024 08:25:54 +0000 Subject: [PATCH] Release new docs to master --- v2.5.x/site/en/home/home.md | 2 +- v2.5.x/site/en/menuStructure/en.json | 2 +- v2.5.x/site/en/release_notes.md | 2 +- .../en/tutorials/hybrid_search_with_milvus.md | 2 +- .../collections/manage-collections.md | 2 +- .../schema/analyzer/analyzer-overview.md | 4 +- .../en/userGuide/search-query-get/boolean.md | 9 +++-- .../search-query-get/keyword-match.md | 38 +++++++++---------- .../search-query-get/multi-vector-search.md | 4 +- .../search-query-get/single-vector-search.md | 6 +-- 10 files changed, 36 insertions(+), 35 deletions(-) diff --git a/v2.5.x/site/en/home/home.md b/v2.5.x/site/en/home/home.md index 51e07d1be..a10e9c46b 100644 --- a/v2.5.x/site/en/home/home.md +++ b/v2.5.x/site/en/home/home.md @@ -101,7 +101,7 @@ id: home.md _Nov 2024 - Milvus 2.5.0 release_ - Added guidance on how to [conduct full text search](full-text-search.md). -- Added guidance on how to [conduct keyword match](keyword-match.md). +- Added guidance on how to [conduct text match](keyword-match.md). - Added guidance on how to [enable nullable and default values](nullable-and-default.md). - Added descriptions of [analyzers](analyzer-overview.md). - Added descriptions of [bitmap indexes](bitmap.md). diff --git a/v2.5.x/site/en/menuStructure/en.json b/v2.5.x/site/en/menuStructure/en.json index 07724ec02..9e30ca9c6 100644 --- a/v2.5.x/site/en/menuStructure/en.json +++ b/v2.5.x/site/en/menuStructure/en.json @@ -702,7 +702,7 @@ "children": [] }, { - "label": "Keyword Match", + "label": "Text Match", "id": "keyword-match.md", "order": 7, "children": [] diff --git a/v2.5.x/site/en/release_notes.md b/v2.5.x/site/en/release_notes.md index af82bbcfa..f6410b737 100644 --- a/v2.5.x/site/en/release_notes.md +++ b/v2.5.x/site/en/release_notes.md @@ -38,7 +38,7 @@ Milvus 2.5 introduces a built-in Cluster Management WebUI, reducing system maint Milvus 2.5 leverages analyzers and indexing from Tantivy for text preprocessing and index building, supporting precise natural language matching of text data based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria. -For details, refer to [Keyword Match](keyword-match.md). +For details, refer to [Text Match](keyword-match.md). #### Bitmap Index diff --git a/v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md b/v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md index e0004004e..f56060e02 100644 --- a/v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md +++ b/v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md @@ -18,7 +18,7 @@ In this tutorial, we will demonstrate how to conduct hybrid search with [Milvus] Milvus supports Dense, Sparse, and Hybrid retrieval methods: - Dense Retrieval: Utilizes semantic context to understand the meaning behind queries. -- Sparse Retrieval: Emphasizes keyword matching to find results based on specific terms, equivalent to full-text search. +- Sparse Retrieval: Emphasizes text matching to find results based on specific terms, equivalent to full-text search. - Hybrid Retrieval: Combines both Dense and Sparse approaches, capturing the full context and specific keywords for comprehensive search results. By integrating these methods, the Milvus Hybrid Search balances semantic and lexical similarities, improving the overall relevance of search outcomes. This notebook will walk through the process of setting up and using these retrieval strategies, highlighting their effectiveness in various search scenarios. diff --git a/v2.5.x/site/en/userGuide/collections/manage-collections.md b/v2.5.x/site/en/userGuide/collections/manage-collections.md index 0c13f408b..1ccb04450 100644 --- a/v2.5.x/site/en/userGuide/collections/manage-collections.md +++ b/v2.5.x/site/en/userGuide/collections/manage-collections.md @@ -85,7 +85,7 @@ For more information about searches and queries, refer to the articles in the [ - [​Full-Text Search](full-text-search.md)​ -- [Keyword Match](keyword-match.md)​ +- [Text Match](keyword-match.md)​ In addition, Milvus also provides enhancements to improve search performance and efficiency. They are disabled by default, and you can enable and use them according to your service requirements. They are​ diff --git a/v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md b/v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md index 4f565f107..a8982165e 100644 --- a/v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md +++ b/v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md @@ -8,7 +8,7 @@ summary: "In text processing, an analyzer is a crucial component that converts r In text processing, an **analyzer** is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements: **tokenizer** and **filter**. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval.​ -In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for keyword matching or converted into sparse embeddings for full text search. For more information, refer to [​Keyword Match](keyword-match.md) or [​Full Text Search](full-text-search.md).​ +In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to [Text Match](keyword-match.md) or [​Full Text Search](full-text-search.md).​
@@ -16,7 +16,7 @@ The use of analyzers may impact performance:​ - **Full text search:** For full text search, DataNode and **QueryNode** channels consume data more slowly because they must wait for tokenization to complete. As a result, newly ingested data takes longer to become available for search.​ -- **Keyword match:** For keyword matching, index creation is also slower since tokenization needs to finish before an index can be built.​ +- **Text match:** For text matching, index creation is also slower since tokenization needs to finish before an index can be built.​
diff --git a/v2.5.x/site/en/userGuide/search-query-get/boolean.md b/v2.5.x/site/en/userGuide/search-query-get/boolean.md index 8fa39d0fa..5a9cd414e 100644 --- a/v2.5.x/site/en/userGuide/search-query-get/boolean.md +++ b/v2.5.x/site/en/userGuide/search-query-get/boolean.md @@ -835,9 +835,10 @@ Match operators include:​ - `like`: Match constants or prefixes (prefix%), infixes (%infix%), and suffixes (%suffix) within constants. It relies on a brute-force search mechanism using wildcards and does not involve text tokenization. While it can achieve exact matches, its query efficiency is relatively low, making it suitable for simple matching tasks or queries on smaller datasets.​ - `TEXT_MATCH`: Match specific terms or keywords on VARCHAR fields, using tokenization and inverted index to enable efficient text search. Compared to `like`, `TEXT_MATCH` offers more advanced text tokenization and filtering capabilities. It is suited for large-scale datasets where higher query performance is required for complex text search scenarios.​ +
- To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [​Keyword Match](keyword-match.md).​ + To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [Text Match](keyword-match.md).​
@@ -1022,11 +1023,11 @@ The filtered results are as follows:​ ``` -#### Example 3: Keyword match on VARCHAR fields​ +#### Example 3: Text match on VARCHAR fields​ -The `TEXT_MATCH` expression is used for keyword match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [​Keyword Match](keyword-match.md).​ +The `TEXT_MATCH` expression is used for text match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [Text Match](keyword-match.md).​ -The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the keyword `"Apple"` or `"iPhone"`:​ +The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the term `"Apple"` or `"iPhone"`:​
Python diff --git a/v2.5.x/site/en/userGuide/search-query-get/keyword-match.md b/v2.5.x/site/en/userGuide/search-query-get/keyword-match.md index 6c660eaff..6b7a36af7 100644 --- a/v2.5.x/site/en/userGuide/search-query-get/keyword-match.md +++ b/v2.5.x/site/en/userGuide/search-query-get/keyword-match.md @@ -1,38 +1,38 @@ --- id: keyword-match.md -summary: "Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​" -title: Keyword Match​ +summary: "Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​" +title: Text Match​ --- -# Keyword Match​ +# Text Match​ -Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​ +Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
-Keyword match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [​Full Text Search](full-text-search.md).​ +Text match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [​Full Text Search](full-text-search.md).​
## Overview -Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and keyword search. For each text entry, Milvus indexes it following the procedure:​ +Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and term-based text search. For each text entry, Milvus indexes it following the procedure:​ 1. [Analyzer](analyzer-overview.md): The analyzer processes input text by tokenizing it into individual words, or tokens, and then applying filters as needed. This allows Milvus to build an index based on these tokens.​ 2. [Indexing](index-scalar-fields.md): After text analysis, Milvus creates an inverted index that maps each unique token to the documents containing it.​ -When a user performs a keyword match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.​ +When a user performs a text match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.​ -![Keyword Match](../../../assets/keyword-match.png) +![Text Match](../../../assets/keyword-match.png) -## Enable keyword match +## Enable text match -Keyword match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable keyword match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.​ +Text match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable text match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.​ ### Set `enable_analyzer` and `enable_match`​ -To enable keyword match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient keyword matches.​ +To enable text match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient text matches.​ ```python from pymilvus import MilvusClient, DataType​ @@ -51,7 +51,7 @@ schema.add_field(​ ### Optional: Configure an analyzer​ -The performance and accuracy of keyword matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.​ +The performance and accuracy of text matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.​ By default, Milvus uses the `standard` analyzer, which tokenizes text based on whitespace and punctuation, removes tokens longer than 40 characters, and converts text to lowercase. No additional parameters are needed to apply this default setting. For more information, refer to [​Standard](standard-analyzer.md).​ @@ -75,9 +75,9 @@ schema.add_field(​ Milvus also provides various other analyzers suited to different languages and scenarios. For more details, refer to [​Overview](analyzer-overview.md).​ -## Use keyword match +## Use text match -Once you have enabled keyword match for a VARCHAR field in your collection schema, you can perform keyword matches using the `TEXT_MATCH` expression.​ +Once you have enabled text match for a VARCHAR field in your collection schema, you can perform text matches using the `TEXT_MATCH` expression.​ ### TEXT_MATCH expression syntax​ @@ -106,9 +106,9 @@ filter = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'deep')"​ ``` -### Search with keyword match​ +### Search with text match​ -Keyword match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using keyword match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.​ +Text match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using text match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.​ In this example, the `filter` expression filters the search results to only include documents that match the specified keywords `keyword1` or `keyword2`. The vector similarity search is then performed on this filtered subset of documents.​ @@ -129,9 +129,9 @@ result = MilvusClient.search(​ ``` -### Query with keyword match​ +### Query with text match​ -Keyword match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.​ +Text match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.​ The example below retrieves documents where the `text` field contains both keywords `keyword1` and `keyword2`.​ @@ -149,6 +149,6 @@ result = MilvusClient.query(​ ## Considerations -- Enabling keyword matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.​ +- Enabling text matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.​ - Once you've defined an analyzer in your schema, its settings become permanent for that collection. If you decide that a different analyzer would better suit your needs, you may consider dropping the existing collection and creating a new one with the desired analyzer configuration.​ \ No newline at end of file diff --git a/v2.5.x/site/en/userGuide/search-query-get/multi-vector-search.md b/v2.5.x/site/en/userGuide/search-query-get/multi-vector-search.md index 1aa8cbde1..b9925c1ea 100644 --- a/v2.5.x/site/en/userGuide/search-query-get/multi-vector-search.md +++ b/v2.5.x/site/en/userGuide/search-query-get/multi-vector-search.md @@ -19,9 +19,9 @@ Hybrid Search is suitable for the following two scenarios:​ Different types of vectors can represent different information, and using various embedding models can more comprehensively represent different features and aspects of the data. For example, using different embedding models for the same sentence can generate a dense vector to represent the semantic meaning and a sparse vector to represent the word frequency in the sentence.​ -- **Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve keyword matching.​ +- **Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve text matching.​ -- **Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact keyword matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.​ +- **Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact text matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.​ For more details, refer to [​Sparse Vector](sparse_vector.md) and [​Dense Vector](dense-vector.md).​ diff --git a/v2.5.x/site/en/userGuide/search-query-get/single-vector-search.md b/v2.5.x/site/en/userGuide/search-query-get/single-vector-search.md index f7a6e8cc9..8b8fba48c 100644 --- a/v2.5.x/site/en/userGuide/search-query-get/single-vector-search.md +++ b/v2.5.x/site/en/userGuide/search-query-get/single-vector-search.md @@ -946,11 +946,11 @@ AUTOINDEX considerably flattens the learning curve of ANN searches. However, the For details on full-text search, refer to [​Full Text Search](full-text-search.md).​ -- Keyword Match​ +- Text Match​ - Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​ + Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​ - For details on keyword match, refer to [​Keyword Match](keyword-match.md).​ + For details on text match, refer to [Text Match](keyword-match.md).​ - Use Partition Key​