Skip to content

Commit

Permalink
Release new docs to master
Browse files Browse the repository at this point in the history
  • Loading branch information
Milvus-doc-bot authored and Milvus-doc-bot committed Nov 28, 2024
1 parent 26c0d22 commit 7c63c78
Show file tree
Hide file tree
Showing 10 changed files with 36 additions and 35 deletions.
2 changes: 1 addition & 1 deletion v2.5.x/site/en/home/home.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ id: home.md
_Nov 2024 - Milvus 2.5.0 release_

- Added guidance on how to [conduct full text search](full-text-search.md).
- Added guidance on how to [conduct keyword match](keyword-match.md).
- Added guidance on how to [conduct text match](keyword-match.md).
- Added guidance on how to [enable nullable and default values](nullable-and-default.md).
- Added descriptions of [analyzers](analyzer-overview.md).
- Added descriptions of [bitmap indexes](bitmap.md).
Expand Down
2 changes: 1 addition & 1 deletion v2.5.x/site/en/menuStructure/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -702,7 +702,7 @@
"children": []
},
{
"label": "Keyword Match",
"label": "Text Match",
"id": "keyword-match.md",
"order": 7,
"children": []
Expand Down
2 changes: 1 addition & 1 deletion v2.5.x/site/en/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Milvus 2.5 introduces a built-in Cluster Management WebUI, reducing system maint

Milvus 2.5 leverages analyzers and indexing from Tantivy for text preprocessing and index building, supporting precise natural language matching of text data based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.

For details, refer to [Keyword Match](keyword-match.md).
For details, refer to [Text Match](keyword-match.md).

#### Bitmap Index

Expand Down
2 changes: 1 addition & 1 deletion v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ In this tutorial, we will demonstrate how to conduct hybrid search with [Milvus]
Milvus supports Dense, Sparse, and Hybrid retrieval methods:

- Dense Retrieval: Utilizes semantic context to understand the meaning behind queries.
- Sparse Retrieval: Emphasizes keyword matching to find results based on specific terms, equivalent to full-text search.
- Sparse Retrieval: Emphasizes text matching to find results based on specific terms, equivalent to full-text search.
- Hybrid Retrieval: Combines both Dense and Sparse approaches, capturing the full context and specific keywords for comprehensive search results.

By integrating these methods, the Milvus Hybrid Search balances semantic and lexical similarities, improving the overall relevance of search outcomes. This notebook will walk through the process of setting up and using these retrieval strategies, highlighting their effectiveness in various search scenarios.
Expand Down
2 changes: 1 addition & 1 deletion v2.5.x/site/en/userGuide/collections/manage-collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ For more information about searches and queries, refer to the articles in the [

- [​Full-Text Search](full-text-search.md)

- [Keyword Match](keyword-match.md)
- [Text Match](keyword-match.md)

In addition, Milvus also provides enhancements to improve search performance and efficiency. They are disabled by default, and you can enable and use them according to your service requirements. They are​

Expand Down
4 changes: 2 additions & 2 deletions v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ summary: "In text processing, an analyzer is a crucial component that converts r

In text processing, an **analyzer** is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements: **tokenizer** and **filter**. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval.​

In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for keyword matching or converted into sparse embeddings for full text search. For more information, refer to [​Keyword Match](keyword-match.md) or [​Full Text Search](full-text-search.md).​
In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to [Text Match](keyword-match.md) or [​Full Text Search](full-text-search.md).​

<div class="alert note">

The use of analyzers may impact performance:​

- **Full text search:** For full text search, DataNode and **QueryNode** channels consume data more slowly because they must wait for tokenization to complete. As a result, newly ingested data takes longer to become available for search.​

- **Keyword match:** For keyword matching, index creation is also slower since tokenization needs to finish before an index can be built.​
- **Text match:** For text matching, index creation is also slower since tokenization needs to finish before an index can be built.​

</div>

Expand Down
9 changes: 5 additions & 4 deletions v2.5.x/site/en/userGuide/search-query-get/boolean.md
Original file line number Diff line number Diff line change
Expand Up @@ -835,9 +835,10 @@ Match operators include:​
- `like`: Match constants or prefixes (prefix%), infixes (%infix%), and suffixes (%suffix) within constants. It relies on a brute-force search mechanism using wildcards and does not involve text tokenization. While it can achieve exact matches, its query efficiency is relatively low, making it suitable for simple matching tasks or queries on smaller datasets.​

- `TEXT_MATCH`: Match specific terms or keywords on VARCHAR fields, using tokenization and inverted index to enable efficient text search. Compared to `like`, `TEXT_MATCH` offers more advanced text tokenization and filtering capabilities. It is suited for large-scale datasets where higher query performance is required for complex text search scenarios.​

<div class="alert note">

To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [​Keyword Match](keyword-match.md).​
To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [Text Match](keyword-match.md).​

</div>

Expand Down Expand Up @@ -1022,11 +1023,11 @@ The filtered results are as follows:​

```

#### Example 3: Keyword match on VARCHAR fields​
#### Example 3: Text match on VARCHAR fields​

The `TEXT_MATCH` expression is used for keyword match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [​Keyword Match](keyword-match.md).​
The `TEXT_MATCH` expression is used for text match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [Text Match](keyword-match.md).​

The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the keyword `"Apple"` or `"iPhone"`:​
The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the term `"Apple"` or `"iPhone"`:​

<div class="multipleCode">
<a href="#python">Python </a>
Expand Down
38 changes: 19 additions & 19 deletions v2.5.x/site/en/userGuide/search-query-get/keyword-match.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,38 @@
---
id: keyword-match.md
summary: "Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​"
title: Keyword Match​
summary: "Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​"
title: Text Match​
---

# Keyword Match​
# Text Match​

Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​

<div class="alert note">

Keyword match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [​Full Text Search](full-text-search.md).​
Text match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [​Full Text Search](full-text-search.md).​

</div>

## Overview

Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and keyword search. For each text entry, Milvus indexes it following the procedure:​
Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and term-based text search. For each text entry, Milvus indexes it following the procedure:​

1. [Analyzer](analyzer-overview.md): The analyzer processes input text by tokenizing it into individual words, or tokens, and then applying filters as needed. This allows Milvus to build an index based on these tokens.​

2. [Indexing](index-scalar-fields.md): After text analysis, Milvus creates an inverted index that maps each unique token to the documents containing it.​

When a user performs a keyword match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.​
When a user performs a text match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.​

![Keyword Match](../../../assets/keyword-match.png)
![Text Match](../../../assets/keyword-match.png)

## Enable keyword match
## Enable text match

Keyword match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable keyword match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.​
Text match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable text match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.​

### Set `enable_analyzer` and `enable_match`

To enable keyword match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient keyword matches.​
To enable text match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient text matches.​

```python
from pymilvus import MilvusClient, DataType​
Expand All @@ -51,7 +51,7 @@ schema.add_field(​

### Optional: Configure an analyzer​

The performance and accuracy of keyword matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.​
The performance and accuracy of text matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.​

By default, Milvus uses the `standard` analyzer, which tokenizes text based on whitespace and punctuation, removes tokens longer than 40 characters, and converts text to lowercase. No additional parameters are needed to apply this default setting. For more information, refer to [​Standard](standard-analyzer.md).​

Expand All @@ -75,9 +75,9 @@ schema.add_field(​

Milvus also provides various other analyzers suited to different languages and scenarios. For more details, refer to [​Overview](analyzer-overview.md).​

## Use keyword match
## Use text match

Once you have enabled keyword match for a VARCHAR field in your collection schema, you can perform keyword matches using the `TEXT_MATCH` expression.​
Once you have enabled text match for a VARCHAR field in your collection schema, you can perform text matches using the `TEXT_MATCH` expression.​

### TEXT_MATCH expression syntax​

Expand Down Expand Up @@ -106,9 +106,9 @@ filter = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'deep')"​

```

### Search with keyword match​
### Search with text match​

Keyword match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using keyword match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.​
Text match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using text match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.​

In this example, the `filter` expression filters the search results to only include documents that match the specified keywords `keyword1` or `keyword2`. The vector similarity search is then performed on this filtered subset of documents.​

Expand All @@ -129,9 +129,9 @@ result = MilvusClient.search(​

```

### Query with keyword match​
### Query with text match​

Keyword match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.​
Text match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.​

The example below retrieves documents where the `text` field contains both keywords `keyword1` and `keyword2`.​

Expand All @@ -149,6 +149,6 @@ result = MilvusClient.query(​

## Considerations

- Enabling keyword matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.​
- Enabling text matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.​

- Once you've defined an analyzer in your schema, its settings become permanent for that collection. If you decide that a different analyzer would better suit your needs, you may consider dropping the existing collection and creating a new one with the desired analyzer configuration.​
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ Hybrid Search is suitable for the following two scenarios:​

Different types of vectors can represent different information, and using various embedding models can more comprehensively represent different features and aspects of the data. For example, using different embedding models for the same sentence can generate a dense vector to represent the semantic meaning and a sparse vector to represent the word frequency in the sentence.​

- **Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve keyword matching.​
- **Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve text matching.​

- **Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact keyword matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.​
- **Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact text matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.​

For more details, refer to [​Sparse Vector](sparse_vector.md) and [​Dense Vector](dense-vector.md).​

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -946,11 +946,11 @@ AUTOINDEX considerably flattens the learning curve of ANN searches. However, the

For details on full-text search, refer to [​Full Text Search](full-text-search.md).​

- Keyword Match​
- Text Match​

Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​

For details on keyword match, refer to [​Keyword Match](keyword-match.md).​
For details on text match, refer to [Text Match](keyword-match.md).​

- Use Partition Key​

Expand Down

0 comments on commit 7c63c78

Please sign in to comment.