Skip to content

Commit

Permalink
Align CN overview with EN. (#956)
Browse files Browse the repository at this point in the history
  • Loading branch information
YiyunNi authored Oct 26, 2021
1 parent 432c312 commit 5d60bb9
Showing 1 changed file with 92 additions and 125 deletions.
217 changes: 92 additions & 125 deletions site/zh-CN/about/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,179 +2,146 @@
id: overview.md
---

# Milvus 是什么

Milvus 是一款开源向量数据库,赋能 AI 应用和向量相似度搜索。

Milvus 提供以下 2 个版本:
- [Milvus 单机版](install_standalone-docker.md)
- [Milvus 分布式版](install_cluster-docker.md)

版本兼容:

<table class="version">
<thead>
<tr>
<th>Milvus 版本</th>
<th>Python SDK 版本</th>
<th>Java SDK 版本</th>
<th>Go SDK 版本</th>
<th>Node SDK 版本</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="install_standalone-docker.md">{{var.milvus_release_version}}</a></td>
<td><a href="example_code.md">{{var.milvus_python_sdk_version}}</a></td>
<td>即将上线</td>
<td>即将上线</td>
<td><a href="https://github.com/milvus-io/milvus-sdk-node">{{var.milvus_node_sdk_version}}</a></td>
</tr>
</tbody>
</table>

Milvus {{var.milvus_release_version}} 是 2.0.0 的预览版本。 该版本引入 Go 语言搭建分布式系统,并采用了新的云原生分布式设计。 后者大大提高了系统扩展性和系统弹性。

## 系统架构

Milvus 2.0 是一款云原生向量数据库,采用存储与计算分离的架构设计。该重构版本的所有组件均为无状态组件,极大地增强了系统弹性和灵活性。

整个系统分为四个层面:

- 接入层(Access Layer)
- 协调服务(Coordinator Service)
- 执行节点(Worker Node)
- 存储服务 (Storage)
---
id: overview.md
title: What is Milvus
related_key: Milvus Overview
summary: Milvus is an open-source vector database designed specifically for AI application development, embeddings similarity search, and MLOps.
---

**接入层(Access Layer):** 系统的门面,包含了一组对等的 proxy 节点。接入层是暴露给用户的统一 endpoint,负责转发请求并收集执行结果。
# Introduction

**协调服务(Coordinator Service):** 系统的大脑,负责分配任务给执行节点。总共有四类协调者角色,分别为 root 协调者、data 协调者、query 协调者和 index 协调者。
This page aims to give you an overview of Milvus by answering several questions. After reading this page, you will learn what Milvus is and how it works, as well as the key concepts, why use Milvus, supported indexes and metrics, example applications, the architecture, and relevant tools.

**执行节点(Worker Node):** 系统的四肢。执行节点只负责被动执行协调服务发起的读写请求。目前有三类执行节点,即 data 节点、query 节点和 index 节点。
## What is Milvus vector database?

**存储服务(Storage):** 系统的骨骼,是所有其他功能实现的基础。Milvus 依赖三类存储:元数据存储、消息存储(Log Broker)和对象存储。
Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.

![系统架构](../../../assets/architecture_02.jpg)
As a database specifically designed to handle queries over input vectors, it is capable of indexing vectors on a trillion scale. Unlike existing relational databases which mainly deal with structured data following a pre-defined pattern, Milvus is designed from the bottom-up to handle embedding vectors converted from unstructure data.

更多系统原理的相关内容详见 [Milvus 2.0 架构](architecture_overview.md)
As the internet grew and evolved, unstructured data became more and more common, including emails, papers, IoT sensor data, Facebook photos, protein structures, and much more. In order for computers to understand and process unstructured data, these are converted into vectors using embedding techniques. Milvus stores and indexes these vectors. Milvus is able to analyze the correlation between two vectors by calculating their similarity distance. If the two embedding vectors are very similar, it means that the original data sources are similar as well.

## Milvus 组件
## Key concepts

Milvus 单机版中包含 3 个组件:
- Milvus
- etcd
- MinIO
In case you are new to the world of vector database and similarity search, read the following explanation of key concepts to gain a better understanding.

Milvus 分布式版中包含 8 个微服务组件和 3 个第三方依赖。
Learn more about [Milvus glossary](glossary.md).

- 微服务组件:
### Unstructured data

- Root coord
- Proxy
- Query coord
- Query node
- Index coord
- Index node
- Data coord
- Data node
Unstructured data, including images, video, audio, and natural language, is information that doesn't follow a predefined model or manner of organization. This data type accounts for ~80% of the world's data, and can be converted into vectors using various artificial intelligence (AI) and machine learning (ML) models.

- 第三方依赖:
### Embedding vectors

- etcd
- MinIO
- Pulsar
An embedding vector is a feature abstraction of unstructured data, such as emails, IoT sensor data, Instagram photos, protein structures, and much more. Mathematically speaking, an embedding vector is an array of floating-point numbers or binaries. Modern embedding techniques are used to convert unstructured data to embedding vectors.

## 产品亮点
### Vector similarity search

#### 针对万亿级向量的毫秒级搜索
Vector similarity search is the process of comparing a vector to a database to find vectors that are most similar to the query vector. Approximate nearest neighbor (ANN) search algorithms are used to accelerate the searching process. If the two embedding vectors are very similar, it means that the original data sources are similar as well.

完成万亿条向量数据搜索的平均延迟以毫秒计。
## Why Milvus?

#### 简化的非结构化数据管理
- High performance when conducting vector search a massive datasets.
- A developer-first community that offers multi-language support and toolchain.
- Cloud scalability and high reliability even in the event of a disruption.
- Hybrid search achieved by pairing scalar filtering with vector similarity search.

- 一整套专为数据科学工作流设计的 API。
- 无论是笔记本、本地集群还是云服务器,始终如一的跨平台用户体验。
- 可以在任何场景下实现实时搜索与分析。
## What indexes and metrics are supported?

#### 稳定可靠的用户体验
Indexes are an organization unit of data. You must declare the index type and similarity metric before you can search or query inserted entities. **If you do not specify an index type, Milvus will operate brute-force search by default.**

Milvus 具有故障转移和故障恢复的机制,即使服务中断,也能确保数据和应用程序的业务连续性。
### Index types

#### 高度可扩展,弹性伸缩
Most of the vector index types supported by Milvus use approximate nearest neighbors search (ANNS), including:

组件级别的高扩展性,支持精准扩展。
- **FLAT**: FLAT is best suited for scenarios that seeks perfectly accurate and exact search results on a small, million-scale dataset.
- **IVF_FLAT**: IVF_FLAT is a quantization-based index and is best suited for scenarios that seeks an ideal balance between accuracy and query speed.
- **IVF_SQ8**: IVF_SQ8 is a quantization-based index and is best suited for scenarios that seeks a significant reduction on disk, CPU, and GPU memory consumption as these resources are very limited.
- **IVF_PQ**: IVF_PQ is a quantization-based index and is best suited for scenarios that seeks high query speed even at the cost of accuracy.
- **HNSW**: HNSQ is a graph-based index and is best suited for scenarios that has a high demand for search efficiency.
- **ANNOY**: ANNOY is a tree based index and is best suited for scenarios that seeks a high recall rate.

#### 混合查询
See [Selecting an Index Best Suited for Your Scenario](index_selection.md) for more details.

除了向量以外,Milvus还支持布尔值、整型、浮点等数据类型。在 Milvus 中,一个 collection 可以包含多个字段来代表数据特征或属性。Milvus 还支持在向量相似度检索过程中进行标量字段过滤。
### Similarity metrics

#### 基于 Lambda 架构的流批一体式数据存储
In Milvus, similarity metrics are used to measure similarities among vectors. Choosing a good distance metric helps improve classification and clustering performance significantly. Depending on the input data forms, specific similarity metric is selected for optimal performance.

Milvus 在存储数据时支持流处理和批处理两种方式,兼顾了流处理的时效性和批处理的效率。统一的对外接口使得向量相似度查询更为便捷。
The metrics that are widely used for floating point embeddings include:

#### 广受社区支持和业界认可
Milvus 项目在 GitHub 上获星超 6,000,拥有逾 1,000 家企业用户,还有活跃的开源社区。Milvus 由 LF AI & DATA 基金会背书,是该基金会的毕业项目。
- **Euclidean distance (L2)**: This metric is generally used in the field of computer vision (CV).
- **Inner product (IP)**: This metric is generally used in the field of natural language processing (NLP).
The metrics that are widely used for binary embeddings include:
- **Hamming**: This metric is generally used in the field of natural language processing (NLP).
- **Jaccard**: This metric is generally used in the field of molecular similarity search.
- **Tanimoto**: This metric is generally used in the field of molecular similarity search.
- **Superstructure**: This metric is generally used to search for similar supersturcture of a molecule.
- **Substructure**: This metric is generally used to search for similar substructure of a molecule.

## 应用场景
See [Similarity Metrics](metric.md#floating) for more information.

#### 生物制药/医疗
药物分子虚拟筛选,病毒结构分析,蛋白质性质预测,药物晶型预测,智能问诊,智能病理分析,高精度图片检索。
## Example applications

#### 电子商务
以图搜图,以商品搜商品,个性化推荐,内容推荐,商品去重。
Milvus makes it easy to add similarity search to your applications. Example applications of Milvus include:

#### 泛互联网服务
个性化音乐推荐,房地产房源检索和推荐,智能客户服务,浏览器内容搜索,APP 商店检索,相似文本检索/新闻内容推荐,视频去重,视频检索,视频推荐,以图搜商品。
- [Image similarity search](image-similarity-search.md): Images made searchable and instantaneously return the most similar images from a massive database.
- [Video similarity search](video-similarity-search.md): By converting key frames into vectors and then feeding the results into Milvus, billions of videos can be searched and recommended in near real time.
- [Audio similarity search](audio-similarity-search.md): Quickly query massive volumes of audio data such as speech, music, sound effects, and surface similar sounds.
- [Molecular similarity search](molecular-similarity-search.md): Blazing fast similarity search, substructure search, or superstructure search for a specified molecule.
- [Recommender system](recommendation-system.md): Recommend information or products based on user behaviors and needs.
- [Question answering system](question-answering-system.md): Interactive digital QA chatbot that automatically answers user questions.
- [DNA sequence classification](dna-sequence-classification.md): Accurately sort out the classification of a gene in milliseconds by comparing similar DNA sequence.
- [Text search engine](text-search-engine): Help users find the information they are looking for by comparing keywords against a database of texts.

#### 计算机软件/硬件
语料/图片分析和推荐,智能产品设计。
See [Milvus tutorials](https://github.com/milvus-io/bootcamp/tree/master/solutions) and [Milvus Adopters](milvus_adopters.md) for more Milvus application scenarios.

#### 广告/工业设计/制造业
智能海报设计,广告精准投放,产品库存管理。
## How is Milvus designed?

## Milvus 概念
As a cloud-native vector database, Milvus 2.0 separates storage and computation by design. To enhance elasticity and flexibility, all components in Milvus 2.0 are stateless.

#### 非结构化数据
非结构化数据指的是数据结构不规则,没有统一的预定义数据模型,不方便用数据库二维逻辑表来表现的数据。非结构化数据包括图片、视频、音频、自然语言等,占所有数据总量的 80%。非结构化数据的处理可以通过各种人工智能(AI)或机器学习(ML)模型转化为向量数据进行。
The system breaks down into four levels:

#### 向量
向量又称为 vector embedding,是各种非结构化数据,如视频、照片、音频的特征抽象。在数学表示上,向量是一个由浮点数或者二值型数据组成的 n 维数组。通过现代的向量转化技术,比如各种人工智能(AI)或者机器学习(ML)模型可以将非结构化数据抽象为 n 维特征向量空间的向量。这样就可以采用最近邻算法(ANN)计算非结构化数据之间的相似度。
- Access layer: The access layer is composed of a group of stateless proxies and serves as the front layer of the system and endpoint to users.
- Coordinator service: The coordinator service assigns tasks to the worker nodes and functions as the system's brain.
- Worker nodes: The worker nodes function as arms and legs and are dumb executors that follow instructions from the coordinator service and execute user-triggerd DML/DDL commands.
- Storage: Storage is the bone of the system, and is responsible for data persistence. It comprises meta storage, log broker, and object storage.

#### 向量相似度检索(近似最近邻搜索)
相似度检索是指将目标对象与数据库中数据进行比对,并召回最相似的结果。同理,向量相似度检索返回的是最相似的向量数据。近似最近邻搜索(ANN)算法能够 [计算向量之间的距离](metric.md)
For more information, see [Architecture Overview](architecture_overview.md).

## 开发工具

#### Milvus Insight
![Architecture](../../../assets/architecture_02.jpg)

[Milvus Insight](https://github.com/milvus-io/milvus-insight) 是 Milvus 图形化管理工具,包含了集群状态可视化、元数据管理、数据查询等实用功能。Milvus Insight 源码未来也会作为独立项目开源。
## Developer tools

#### Milvus CLI
Milvus is supported by rich APIs and tools to facilitate DevOps.

[Milvus CLI](https://github.com/milvus-io/milvus_cli#overview) 是基于 [PyMilvus](https://github.com/milvus-io/pymilvus) 的 Milvus 命令行界面,支持连接服务器、数据操作和数据导出/导入。
### API access

#### Milvus DM 数据迁移工具
[Milvus 数据迁移工具](migrate_overview.md)现已上线。
Milvus has client libraries wrapped on top of the Milvus API that can be used to insert, delete, and query data programmatically from application code:

## 加入开发者社区
- [PyMilvus](https://github.com/milvus-io/pymilvus)
- [Node.js SDK](https://github.com/milvus-io/milvus-sdk-node)
- [Go SDK](https://github.com/milvus-io/milvus-sdk-go)

如果你有任何建议、意见或问题,欢迎加入 Milvus 的 [Slack](https://join.slack.com/t/milvusio/shared_invite/zt-e0u4qu3k-bI2GDNys3ZqX1YCJ9OM~GQ) 社区与我们的工程师团队交流。
We are working on enabling more new client libraries. If you would like to contribute, go to the corresponding repository of [the Milvus Project](https://github.com/milvus-io).

[![Milvus Slack Channel](../../../assets/slack.png)](https://join.slack.com/t/milvusio/shared_invite/zt-e0u4qu3k-bI2GDNys3ZqX1YCJ9OM~GQ)
### Milvus ecosystem tools

你也可以访问 [常见问题](https://milvus.io/cn/docs/v1.1.0/performance_faq.md) 页面查看相关问题。
The Milvus ecosystem provides helpful tools including:

订阅 Milvus 邮件:
- [Milvus CLI](https://github.com/milvus-io/milvus_cli#overview)
- [Milvus Insight](https://github.com/milvus-io/milvus-insight), a graphical management system for Milvus.
- [MilvusDM](https://milvus.io/docs/v2.0.0/migrate_overview.md) (Milvus Data Migration), an open-source tool designed specifically for importing and exporting data with Milvus.
- [Milvus sizing tool](https://zilliz.com/sizing-tool), which helps you estimate the raw file size, memory size, and stable disk size needed for a specified number of vectors with various index types.

- [Technical Steering Committee](https://lists.lfai.foundation/g/milvus-tsc)
- [Technical Discussions](https://lists.lfai.foundation/g/milvus-technical-discuss)
- [Announcement](https://lists.lfai.foundation/g/milvus-announce)
## What's next

关注我们的社交媒体:
- Get started with a 3-minute tutorial:
- [Hello Milvus](example_code.md)
- Install Milvus for your testing or production environment:
- [Installation Prerequisites](prerequisite-docker.md)
- [Install Milvus Standalone](install_standalone-docker.md)
- [Install Milvus Cluster](install_cluster-docker.md)
- If you're interested in diving deep into the design details of Milvus:
- Read about [Milvus architecture](architecture_overview.md)

- [知乎](zhihu.com/org/zilliz-11/columns)
- [CSDN](http://zilliz.blog.csdn.net)
- [Bilibili](http://space.bilibili.com/478166626)
- Zilliz 技术交流微信群
![wechat](../../../assets/wechat_qr_code.jpeg)
###### 如二维码失效,请加zilliz小助手微信:zilliz-tech

0 comments on commit 5d60bb9

Please sign in to comment.