Skip to content

Commit

Permalink
docs: add v2.2 roadmap (#45)
Browse files Browse the repository at this point in the history
Signed-off-by: Gaius <[email protected]>
  • Loading branch information
gaius-qi authored Dec 29, 2023
1 parent 795961f commit 1498708
Show file tree
Hide file tree
Showing 2 changed files with 171 additions and 13 deletions.
94 changes: 88 additions & 6 deletions docs/others/Roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ id: Roadmap
title: Roadmap
---

## 2022 Roadmap {#2022-roadmap}
## v2.0

### Manager {#manager}
Manager:

- Console
- Refactor project.
Expand All @@ -21,24 +21,106 @@ title: Roadmap
- Application-level speed limit and other configurations.
- Added open api interface authentication.

### Scheduler {#scheduler}
Scheduler:

- Improve scheduling stability and collect metrics during scheduling.
- Scheduler integrates machine learning algorithms to improve scheduling capabilities.
- Allocate download peers based on peer bandwidth traffic.

### Dfdaemon {#wip-dfdaemon}
Client:

- Support seed peer feature.
- Improve task download efficiency and stability.
- Refactoring to use GRPC bidirectional stream for piece information passing between peers.
- Support piece download priority.

### Document {#document}
Document:

- Refactored d7y.io website and added dragonfly 2.0 documentation.

### Others {#others}
Others:

- Provide performance testing solutions in perf-tests repo.
- Upgrade golang 1.18, refactor the project using the generic feature.

## v2.1

Manager:

- Console [v1.0.0](https://github.com/dragonflyoss/console/tree/release-1.0.0) is released and it provides
a new console for users to operate Dragonfly.
- Provides the ability to control the features of the scheduler in the manager. If the scheduler preheat feature is
not in feature flags, then it will stop providing the preheating in the scheduler.
- Add personal access tokens feature in the manager and personal access token
contains your security credentials for the restful open api.
- Add TLS config to manager rest server.
- Add cluster in the manager and the cluster contains a scheduler cluster and a seed peer cluster.
- Use unscoped delete when destroying the manager's resources.
- Add uk_scheduler index and uk_seed_peer index in the table of the database.
- Remove security domain feature and security feature in the manager.
- Add advertise port config.

Scheduler:

- Add network topology feature and it can probe the network latency between peers, providing better scheduling capabilities.
- Scheduler adds database field in config and moves the redis config to database field.
- Fix filtering and evaluation in scheduling. Since the final length of the filter is
the candidateParentLimit used, the parents after the filter is wrong.
- Fix storage can not write records to file when bufferSize is zero.
- Add advertise port config.
- Fix fsm changes state failed when register task.

Client:

- Dfstore adds GetObjectMetadatas and CopyObject to supports using Dragonfly as the JuiceFS backend.
- Fix dfdaemon fails to start when there is no available scheduler address.
- Fix object downloads failed by dfstore when dfdaemon enabled concurrent.
- Replace net.Dial with grpc health check in dfdaemon.

Others:

- A third party security audit was performed by Trail of Bits, you can see the full report [here](https://github.com/dragonflyoss/Dragonfly2/blob/main/docs/security/dragonfly-comprehensive-report-2023.pdf).
- Hiding sensitive information in logs, such as the token in the header.

## v2.2

Manager:

- Peer features are configurable. For example, you can make the peer can not be uploaded and can only be downloaded.
- Configure the weight of the scheduling.
- Add clearing P2P task cache.
- Display P2P traffic distribution.
- Peer information display, including CPU, Memory, etc.

Scheduler:

- Provide metadata storage to support file writing and seeding.
- Optimize scheduling algorithm and improve bandwidth utilization in the P2P network.

Client:

- Client written in Rust, reduce CPU usage and Memory usage.
- Supports RDMA for faster network transmission in the P2P network.
It can better support the loading of AI inference models into memory.
- Supports file writing and seeding, it can be accessed in the P2P cluster without uploading to other storage.
Helps AI models and AI datasets to be read and written faster in the P2P network.

Others:

- Defines the V2 of the P2P transfer protocol.

Document:

- Restructure the document to make it easier for users to use.
- Enhance the landing page UI.

AI Infrastructure:

- Supports Triton Inference Server to accelerate model distribution, refer to [dragonfly-repository-agent](https://github.com/dragonflyoss/dragonfly-repository-agent).
- Supports TorchServer to accelerate model distribution, refer to [document](https://d7y.io/docs/next/setup/integration/torchserve).
- Supports HuggingFace to accelerate model distribution and dataset distribution, refer to [document](https://d7y.io/docs/next/setup/integration/hugging-face).
- Supports Git LFS to accelerate file distribution, refer to [document](https://d7y.io/docs/next/setup/integration/git-lfs).
- Supports JuiceFS to accelerate file downloads from object storage, JuiceFS read requests via
peer proxy and write requests via the default client of object storage.
- Supports Fluid to accelerate model distribution.
- Support AI infrastructure to efficiently distribute models and datasets, and integrated with the AI ecosystem.
90 changes: 83 additions & 7 deletions i18n/zh/docusaurus-plugin-content-docs/current/others/Roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ id: Roadmap
title: Roadmap
---

## 2022 Roadmap
## v2.0

### Manager
Manager:

- 控制台
- Console
- 项目整体重构。
- 优化页面 UI, 整体页面视觉改版。
- 统一权限管理。
Expand All @@ -21,25 +21,101 @@ title: Roadmap
- 增加 Open API 接口鉴权。
- 无 Seed Peer 场景下,预热功能实现。

### Scheduler
Scheduler:

- 提高调度的稳定性以及效率,并收集可用数据。
- 实现基于机器学习的多场景自适应智能 P2P 节点调度算法及优化。
- 优化当前基于负载进行的调度策略,改为基于 Peer 带宽进行调度。
- 针对 Piece 下载优先级特征值进行调度。

### Dfdaemon
Client:

- 支持 Seed Peer 功能。
- 提升任务下载效率以及稳定性。
- 使用 GRPC 双向流传递 Peer 间 Piece 信息。
- 支持下载 Piece 优先级。

### 文档
Document:

- 重构 d7y.io 官网,并完成 Dragonfly 2.0 项目文档。

### 其他
Others:

- perf-tests 仓库中提供压测解决方案。
- 升级 Golang 1.18 版本,基于泛型重构已有代码。

## v2.1

Manager:

- Console [v1.0.0](https://github.com/dragonflyoss/console/tree/release-1.0.0) 已经发布,它是一个全新的可视化控制台方便用户操作 P2P 集群。
- 提供控制 Scheduler 可以提供的服务,例如在 Manager 中设置 Scheduler 不提供预热功能,那么 Scheduler 实例就会拒绝预热请求。
- 新增 Personal Access Tokens 功能,用户可以创建自己的 Personal Access Tokens 在调用 Open API 的时候鉴权使用。
- 新增 Cluster 资源单位,Cluster 代表一个 P2P 集群,其只包含一个 Scheduler Cluster 和一个 Seed Peer Cluster,并且二者关联。
- Manager REST 服务提供 TLS 配置。
- Manager 中 Scheduler、Seed Peer 等资源删除过程中,不再使用软删除。
- Scheduler 数据库表中新增 uk_scheduler 索引,Seed Peer 数据库表中新增 uk_seed_peer 索引。
- 由于初期功能设计定位不清晰的原因,删除 Security Domain 和 Security 的功能。
- 新增 Advertise Port 配置,方便用户配置不同的 Advertise Port。

Scheduler:

- 新增虚拟网络拓扑探索功能,能够在 P2P 运行时探测节点之间的网络延迟,从而构建一个虚拟网络拓扑结构提供调度使用。
- Scheduler 新增 Database 配置,并且把之前 Redis 的配置信息移入到 Database 配置中,并且兼容老版本。
- 修复调度器过滤以及评估过程中 candidateParentLimit 可能影响到调度结果的问题。
- 修复 Scheduler 中的 Storage 在 bufferSize 为 0 的时候,导致的无法写入下载记录的问题。
- 新增 Advertise Port 配置,方便用户配置不同的 Advertise Port。
- 修复 Task 注册阶段状态机状态变更错误的问题。

Client:

- Dfstore 提供 GetObjectMetadatas and CopyObject 接口,支持 Dragonfly 作为 JuiceFS 的后端存储。
- 修复当 Dfdaemon 没有可用的 Scheduler 地址时启动失败的现象。
- 修复 Dfstore 在 Dfdaemon 并发下载时,可能导致的对象存储下载失败。
- 在 Dfdaemon 中使用 GRPC 健康检查代替 net.Dial。

Others:

- 完成 Trail of Bits 的安全审计,报告可以参考[文档](https://github.com/dragonflyoss/Dragonfly2/blob/main/docs/security/dragonfly-comprehensive-report-2023.pdf)
- 日志中隐藏敏感信息,例如 Header 中的一些 Token 信息等。

## v2.2

Manager:

- 可动态配置 Client 功能,例如可以让 Client 停止上传功能。
- 可动态配置 Scheduler 的调度计算权重。
- 增加缓存清理功能,清理特定任务的所有 Cache。
- P2P 流量走势可视化。
- 展示 Client 节点信息,例如 CPU、内存等。

Scheduler:

- 实现基于 V2 版本协议的调度功能。
- 提供元信息存储功能,方便通过 Client 向集群内写入文件和做种。
- 优化调度,提升 P2P 网络种节点带宽利用率。

Client:

- 使用 Rust 重写,减少 CPU 负载和 Memory 占用。
- 支持 RDMA 提高系统吞吐量、降低系统的网络通信延迟。可以更好的支持 AI 推理场景将模型从远端加载到内存。
- 支持文件写入和做种,可以不依赖其他存储持久化,提高集群内文件读写性能。帮助 AI 场景模型和数据集更快速读写。

Others:

- 定义 V2 版本的 P2P 传输协议。

Document:

- 重构文档内容,提升用户体验。
- 优化首页的 UI。

AI Infrastructure:

- 支持 Triton Inference Server 加速模型分发,参考文档 [Triton Inference Server](https://github.com/dragonflyoss/dragonfly-repository-agent)
- 支持 TorchServer 加速模型分发,参考文档 [TorchServe](https://d7y.io/docs/next/setup/integration/torchserve)
- 支持 HuggingFace 加速模型和数据集分发,参考文档 [Hugging Face](https://d7y.io/docs/next/setup/integration/hugging-face)
- 支持 Git LFS 加速文件分发,参考文档 [Git LFS](https://d7y.io/docs/next/setup/integration/git-lfs)
- 支持 JuiceFS 加速文件分发,Dragonfly 作为 JuiceFS 和对象存储中间 Cache 层。
- 支持 Fluid 加速分发模型。
- 支持更多 AI 基础设施高效分发模型以及数据集,与 AI 生态融合。

0 comments on commit 1498708

Please sign in to comment.