From 149870828dbab6d9cd38a78f2c286d9554e17947 Mon Sep 17 00:00:00 2001 From: Gaius Date: Fri, 29 Dec 2023 13:48:21 +0800 Subject: [PATCH] docs: add v2.2 roadmap (#45) Signed-off-by: Gaius --- docs/others/Roadmap.md | 94 +++++++++++++++++-- .../current/others/Roadmap.md | 90 ++++++++++++++++-- 2 files changed, 171 insertions(+), 13 deletions(-) diff --git a/docs/others/Roadmap.md b/docs/others/Roadmap.md index 640ad515..e5dacf13 100644 --- a/docs/others/Roadmap.md +++ b/docs/others/Roadmap.md @@ -3,9 +3,9 @@ id: Roadmap title: Roadmap --- -## 2022 Roadmap {#2022-roadmap} +## v2.0 -### Manager {#manager} +Manager: - Console - Refactor project. @@ -21,24 +21,106 @@ title: Roadmap - Application-level speed limit and other configurations. - Added open api interface authentication. -### Scheduler {#scheduler} +Scheduler: - Improve scheduling stability and collect metrics during scheduling. - Scheduler integrates machine learning algorithms to improve scheduling capabilities. - Allocate download peers based on peer bandwidth traffic. -### Dfdaemon {#wip-dfdaemon} +Client: - Support seed peer feature. - Improve task download efficiency and stability. - Refactoring to use GRPC bidirectional stream for piece information passing between peers. - Support piece download priority. -### Document {#document} +Document: - Refactored d7y.io website and added dragonfly 2.0 documentation. -### Others {#others} +Others: - Provide performance testing solutions in perf-tests repo. - Upgrade golang 1.18, refactor the project using the generic feature. + +## v2.1 + +Manager: + +- Console [v1.0.0](https://github.com/dragonflyoss/console/tree/release-1.0.0) is released and it provides + a new console for users to operate Dragonfly. +- Provides the ability to control the features of the scheduler in the manager. If the scheduler preheat feature is + not in feature flags, then it will stop providing the preheating in the scheduler. +- Add personal access tokens feature in the manager and personal access token + contains your security credentials for the restful open api. +- Add TLS config to manager rest server. +- Add cluster in the manager and the cluster contains a scheduler cluster and a seed peer cluster. +- Use unscoped delete when destroying the manager's resources. +- Add uk_scheduler index and uk_seed_peer index in the table of the database. +- Remove security domain feature and security feature in the manager. +- Add advertise port config. + +Scheduler: + +- Add network topology feature and it can probe the network latency between peers, providing better scheduling capabilities. +- Scheduler adds database field in config and moves the redis config to database field. +- Fix filtering and evaluation in scheduling. Since the final length of the filter is + the candidateParentLimit used, the parents after the filter is wrong. +- Fix storage can not write records to file when bufferSize is zero. +- Add advertise port config. +- Fix fsm changes state failed when register task. + +Client: + +- Dfstore adds GetObjectMetadatas and CopyObject to supports using Dragonfly as the JuiceFS backend. +- Fix dfdaemon fails to start when there is no available scheduler address. +- Fix object downloads failed by dfstore when dfdaemon enabled concurrent. +- Replace net.Dial with grpc health check in dfdaemon. + +Others: + +- A third party security audit was performed by Trail of Bits, you can see the full report [here](https://github.com/dragonflyoss/Dragonfly2/blob/main/docs/security/dragonfly-comprehensive-report-2023.pdf). +- Hiding sensitive information in logs, such as the token in the header. + +## v2.2 + +Manager: + +- Peer features are configurable. For example, you can make the peer can not be uploaded and can only be downloaded. +- Configure the weight of the scheduling. +- Add clearing P2P task cache. +- Display P2P traffic distribution. +- Peer information display, including CPU, Memory, etc. + +Scheduler: + +- Provide metadata storage to support file writing and seeding. +- Optimize scheduling algorithm and improve bandwidth utilization in the P2P network. + +Client: + +- Client written in Rust, reduce CPU usage and Memory usage. +- Supports RDMA for faster network transmission in the P2P network. + It can better support the loading of AI inference models into memory. +- Supports file writing and seeding, it can be accessed in the P2P cluster without uploading to other storage. + Helps AI models and AI datasets to be read and written faster in the P2P network. + +Others: + +- Defines the V2 of the P2P transfer protocol. + +Document: + +- Restructure the document to make it easier for users to use. +- Enhance the landing page UI. + +AI Infrastructure: + +- Supports Triton Inference Server to accelerate model distribution, refer to [dragonfly-repository-agent](https://github.com/dragonflyoss/dragonfly-repository-agent). +- Supports TorchServer to accelerate model distribution, refer to [document](https://d7y.io/docs/next/setup/integration/torchserve). +- Supports HuggingFace to accelerate model distribution and dataset distribution, refer to [document](https://d7y.io/docs/next/setup/integration/hugging-face). +- Supports Git LFS to accelerate file distribution, refer to [document](https://d7y.io/docs/next/setup/integration/git-lfs). +- Supports JuiceFS to accelerate file downloads from object storage, JuiceFS read requests via + peer proxy and write requests via the default client of object storage. +- Supports Fluid to accelerate model distribution. +- Support AI infrastructure to efficiently distribute models and datasets, and integrated with the AI ecosystem. diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/others/Roadmap.md b/i18n/zh/docusaurus-plugin-content-docs/current/others/Roadmap.md index ce3e78f9..7a2835d1 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/others/Roadmap.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/others/Roadmap.md @@ -3,11 +3,11 @@ id: Roadmap title: Roadmap --- -## 2022 Roadmap +## v2.0 -### Manager +Manager: -- 控制台 +- Console - 项目整体重构。 - 优化页面 UI, 整体页面视觉改版。 - 统一权限管理。 @@ -21,25 +21,101 @@ title: Roadmap - 增加 Open API 接口鉴权。 - 无 Seed Peer 场景下,预热功能实现。 -### Scheduler +Scheduler: - 提高调度的稳定性以及效率,并收集可用数据。 - 实现基于机器学习的多场景自适应智能 P2P 节点调度算法及优化。 - 优化当前基于负载进行的调度策略,改为基于 Peer 带宽进行调度。 - 针对 Piece 下载优先级特征值进行调度。 -### Dfdaemon +Client: - 支持 Seed Peer 功能。 - 提升任务下载效率以及稳定性。 - 使用 GRPC 双向流传递 Peer 间 Piece 信息。 - 支持下载 Piece 优先级。 -### 文档 +Document: - 重构 d7y.io 官网,并完成 Dragonfly 2.0 项目文档。 -### 其他 +Others: - perf-tests 仓库中提供压测解决方案。 - 升级 Golang 1.18 版本,基于泛型重构已有代码。 + +## v2.1 + +Manager: + +- Console [v1.0.0](https://github.com/dragonflyoss/console/tree/release-1.0.0) 已经发布,它是一个全新的可视化控制台方便用户操作 P2P 集群。 +- 提供控制 Scheduler 可以提供的服务,例如在 Manager 中设置 Scheduler 不提供预热功能,那么 Scheduler 实例就会拒绝预热请求。 +- 新增 Personal Access Tokens 功能,用户可以创建自己的 Personal Access Tokens 在调用 Open API 的时候鉴权使用。 +- 新增 Cluster 资源单位,Cluster 代表一个 P2P 集群,其只包含一个 Scheduler Cluster 和一个 Seed Peer Cluster,并且二者关联。 +- Manager REST 服务提供 TLS 配置。 +- Manager 中 Scheduler、Seed Peer 等资源删除过程中,不再使用软删除。 +- Scheduler 数据库表中新增 uk_scheduler 索引,Seed Peer 数据库表中新增 uk_seed_peer 索引。 +- 由于初期功能设计定位不清晰的原因,删除 Security Domain 和 Security 的功能。 +- 新增 Advertise Port 配置,方便用户配置不同的 Advertise Port。 + +Scheduler: + +- 新增虚拟网络拓扑探索功能,能够在 P2P 运行时探测节点之间的网络延迟,从而构建一个虚拟网络拓扑结构提供调度使用。 +- Scheduler 新增 Database 配置,并且把之前 Redis 的配置信息移入到 Database 配置中,并且兼容老版本。 +- 修复调度器过滤以及评估过程中 candidateParentLimit 可能影响到调度结果的问题。 +- 修复 Scheduler 中的 Storage 在 bufferSize 为 0 的时候,导致的无法写入下载记录的问题。 +- 新增 Advertise Port 配置,方便用户配置不同的 Advertise Port。 +- 修复 Task 注册阶段状态机状态变更错误的问题。 + +Client: + +- Dfstore 提供 GetObjectMetadatas and CopyObject 接口,支持 Dragonfly 作为 JuiceFS 的后端存储。 +- 修复当 Dfdaemon 没有可用的 Scheduler 地址时启动失败的现象。 +- 修复 Dfstore 在 Dfdaemon 并发下载时,可能导致的对象存储下载失败。 +- 在 Dfdaemon 中使用 GRPC 健康检查代替 net.Dial。 + +Others: + +- 完成 Trail of Bits 的安全审计,报告可以参考[文档](https://github.com/dragonflyoss/Dragonfly2/blob/main/docs/security/dragonfly-comprehensive-report-2023.pdf)。 +- 日志中隐藏敏感信息,例如 Header 中的一些 Token 信息等。 + +## v2.2 + +Manager: + +- 可动态配置 Client 功能,例如可以让 Client 停止上传功能。 +- 可动态配置 Scheduler 的调度计算权重。 +- 增加缓存清理功能,清理特定任务的所有 Cache。 +- P2P 流量走势可视化。 +- 展示 Client 节点信息,例如 CPU、内存等。 + +Scheduler: + +- 实现基于 V2 版本协议的调度功能。 +- 提供元信息存储功能,方便通过 Client 向集群内写入文件和做种。 +- 优化调度,提升 P2P 网络种节点带宽利用率。 + +Client: + +- 使用 Rust 重写,减少 CPU 负载和 Memory 占用。 +- 支持 RDMA 提高系统吞吐量、降低系统的网络通信延迟。可以更好的支持 AI 推理场景将模型从远端加载到内存。 +- 支持文件写入和做种,可以不依赖其他存储持久化,提高集群内文件读写性能。帮助 AI 场景模型和数据集更快速读写。 + +Others: + +- 定义 V2 版本的 P2P 传输协议。 + +Document: + +- 重构文档内容,提升用户体验。 +- 优化首页的 UI。 + +AI Infrastructure: + +- 支持 Triton Inference Server 加速模型分发,参考文档 [Triton Inference Server](https://github.com/dragonflyoss/dragonfly-repository-agent)。 +- 支持 TorchServer 加速模型分发,参考文档 [TorchServe](https://d7y.io/docs/next/setup/integration/torchserve)。 +- 支持 HuggingFace 加速模型和数据集分发,参考文档 [Hugging Face](https://d7y.io/docs/next/setup/integration/hugging-face)。 +- 支持 Git LFS 加速文件分发,参考文档 [Git LFS](https://d7y.io/docs/next/setup/integration/git-lfs)。 +- 支持 JuiceFS 加速文件分发,Dragonfly 作为 JuiceFS 和对象存储中间 Cache 层。 +- 支持 Fluid 加速分发模型。 +- 支持更多 AI 基础设施高效分发模型以及数据集,与 AI 生态融合。