Skip to content

Commit

Permalink
updated readme & detailed intro en doc, added detailed intro zh doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kayla050 committed Nov 12, 2024
1 parent e05c54a commit 95dfa82
Show file tree
Hide file tree
Showing 6 changed files with 148 additions and 21 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ Additionally, CSGHub features microservice submodules and standardized OpenAPIs

For further information, please see the [detailed introduction](./docs/detailed_intro_en.md) of CSGHub.

### Demo Video

To help users get up to speed with CSGHub, we have created a demo video that highlights its key features and how it works. You can watch it below or on [YouTube](https://www.youtube.com/watch?v=6LwGQ07qBxU)/[Bilibili](https://www.bilibili.com/video/BV1ynmxY3EXz/).
<video width="658" height="432" src="https://github.com/user-attachments/assets/04f9fa17-9294-44c1-8c4a-4d7b9a5c66fa"></video>

### Quick Start

- For those looking to quickly explore, experiment with CSGHub's free SaaS version on the [OpenCSG website](https://opencsg.com). Refer to this [brief quick start guide](./docs/csghub_saas_en.md) to handle LLMs/datasets and deploy LLM applications with CSGHub SaaS interface.
Expand Down
5 changes: 5 additions & 0 deletions README_jp.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ CSGHubは、大規模言語モデル(LLM)の資産管理のために設計

詳細については、CSGHubの[詳細紹介](./docs/detailed_intro_en.md)をご覧ください。

### デモ動画

CSGHub の主要な機能と使い方を素早く理解していただくために、デモ動画を作成しました。下記または [YouTube](https://www.youtube.com/watch?v=6LwGQ07qBxU)/[Bilibili](https://www.bilibili.com/video/BV1ynmxY3EXz/) でご覧いただけます。
<video width="658" height="432" src="https://github.com/user-attachments/assets/04f9fa17-9294-44c1-8c4a-4d7b9a5c66fa"></video>

### クイックスタート

- **探索したい方**[OpenCSGのウェブサイト](https://opencsg.com)で、CSGHubの無料SaaS版を試用できます。[クイックスタートガイド](./docs/csghub_saas_en.md)を参考にして、CSGHub SaaSインターフェースを使い、LLMやデータセットの管理とLLMアプリケーションのデプロイをお試しください。
Expand Down
5 changes: 5 additions & 0 deletions README_kr.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ CSGHub는 대규모 언어 모델(LLM) 자산을 관리하기 위해 설계된

CSGHub에 대한 자세한 내용은 [상세 소개](./docs/detailed_intro_en.md)를 참조하세요.

### 데모 비디오

CSGHub 를 빠르게 익힐 수 있도록 주요 기능과 사용 방법을 소개하는 데모 비디오를 준비했습니다. 아래에서 보거나 [YouTube](https://www.youtube.com/watch?v=6LwGQ07qBxU)/[Bilibili](https://www.bilibili.com/video/BV1ynmxY3EXz/).
<video width="658" height="432" src="https://github.com/user-attachments/assets/04f9fa17-9294-44c1-8c4a-4d7b9a5c66fa"></video>

### 빠른 시작

- **탐색을 원하는 사용자**: [OpenCSG 웹사이트](https://opencsg.com)에서 CSGHub의 무료 SaaS 버전을 사용해 보세요. [간단한 빠른 시작 가이드](./docs/csghub_saas_en.md)를 참조하여 CSGHub SaaS 인터페이스로 LLM과 데이터셋을 관리하고 LLM 애플리케이션을 배포해 보세요.
Expand Down
7 changes: 6 additions & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,12 @@ CSGHub 是一个开源平台,专为管理大语言模型(LLM)资产而设
- 一站式数据处理与智能标注系统
- 高可用与灾难恢复设计

详细信息请参见 [CSGHub 详细介绍](./docs/detailed_intro_en.md)
详细信息请参见 [CSGHub 详细介绍](./docs/detailed_intro_zh.md)

### 演示视频

为了帮助用户尽快熟悉 CSGHub,我们制作了一个演示视频,重点介绍其主要功能。您也可以在 [YouTube](https://www.youtube.com/watch?v=6LwGQ07qBxU)[Bilibili](https://www.bilibili.com/video/BV1ynmxY3EXz/) 上观看。
<video width="658" height="432" src="https://github.com/user-attachments/assets/04f9fa17-9294-44c1-8c4a-4d7b9a5c66fa"></video>

### 快速开始

Expand Down
90 changes: 70 additions & 20 deletions docs/detailed_intro_en.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,77 @@
## CSGHub Core Functions
In the era of LLM, data and models are increasingly becoming the most important digital assets for businesses and individual users. However, there are currently issues such as fragmented management tools, limited management methods, and localization, which not only pose potential threats to secure operations but also might hinder the updating and iteration of enterprise-scale models. If you believe that large models will become a major driving force in the upcoming revolution, you may also be considering how to manage core assets — models, data, and large model application code — more efficiently and securely. `CSGHub` is an open-source project designed to address these issues.
# CSGHub: In-Depth Overview of Features and Architecture

### CSGHub Tech Design
The technical design of `CSGHub` are as follows:
- `CSGHub` integrates multiple technologies including Git Servers, Git LFS (Large File Storage) protocol, and Object Storage Service (OSS), providing a reliable data storage layer, a flexible infrastructure access layer, and extensive support for development tools.
- Utilizing a service-oriented architecture, `CSGHub` offers backend services through `CSGHub` Server and a management interface via `CSGHub` Web Service. Ordinary users can quickly initiate services using Docker compose or Kubernetes Helm Chart for enterprise-level asset management. Users with in-house development capabilities can utilize `CSGHub` Server for secondary development to integrate management functions into external systems or to customize advanced features.
- Leveraging outstanding open-source projects like Apache Arrow and DuckDB, `CSGHub` supports previewing of Parquet data file formats, facilitating localized dataset management for researchers and common users.
- `CSGHub` provides an intuitive web interface and permission design for enterprise organization structure. Users can realize version control management, online browsing and downloading through the web UI, as well as set the visibility scope of datasets and model files to realize data security isolation, and can also initiate topic discussions on models and datasets.
As Large Language Models (LLMs) become increasingly central to digital transformation, organizations face growing challenges in managing these sophisticated LLM assets. They often struggle with scattered tools, limited management options, and security concerns. CSGHub is an open-source platform designed to address these challenges, making LLM assets management straightforward and secure.

Our R&D team has been focusing on AI + DevOps for a long time, and we hope to solve the pain points in the development process of large models through the `CSGHub` project. We encourage everyone to contribute high-quality development and operation and maintenance documents, and work together to improve the platform, so that large models assets can be more traceable and efficient.
## CSGHub Key Features

### CSGHub Demo Video
In order to help users to quickly understand the features and usage of `CSGHub`, we have recorded a demo video. You can watch this video to get a quick understanding of the main features and operation procedures of this program.
- `CSGHub` Demo video is as below,you can also check it at [YouTube](https://www.youtube.com/watch?v=SFDISpqowXs) or [Bilibili](https://www.bilibili.com/video/BV1wk4y1X7G7/)
<video width="658" height="432" src="https://github-production-user-asset-6210df.s3.amazonaws.com/3232817/296556812-205d07f2-de9d-4a7f-b3f5-83514a71453e.mp4"></video>
- **Unified LLMs Management:**
- Full lifecycle management for models, datasets, and code
- Support for large file operations and web-based collaboration
- Integrated version control and asset tracking

### Architecture
`CSGHub` is made with two typical parts: Portal and Server. This repo corresponds to `CSGHub` Portal, while `CSGHub` Server is another high-performance backend project implemented with Golang.
- **Extensible Development Framework:**
- Full support for HTTPS and SSH protocols
- Seamless integration with popular SDK (Gradio, Streamlit)
- Automated environment optimization for model deployment
- One-click inference and fine-tuning capabilities

If you want to dive deep into `CSGHub` Server detail or wish to integrate the Server with your own frontend system or more, you can check the [`CSGHub` Server open-source project](https://github.com/OpenCSGs/`CSGHub`-server).
- **Advanced Model Capabilities and Optimization:**
- Keep track of all model versions automatically
- Built-in model format conversion and data processing utilities
- Support for various data format conversion (CSV, JSON, Parquet)
- Web-based data preview capabilities

#### CSGHub Portal Architecture
<img src="/docs/images/portal_tech_graph.png" width='800'>
- **Space and Asset Management Assistant (Copilot):**
- Quickly build and showcase AI applications
- Flexible asset management through Copilot assistant
- Enterprise-ready on-premises deployment option

#### CSGHub Server Architecture
<img src="/docs/images/server_tech_graph.png" width='800'>
- **Multi-Source Data Synchronization and Recommendation:**
- Integration with the OpenCSG community
- Support for synchronizing models and datasets in the community
- Scenario-based solution suggestions

- **Enterprise-Level Security and Access Control:**
- Support for integration with enterprise user systems
- Asset visibility settings
- License tracking and validation

- **On-Premises Deployment Solutions:**
- One-click on-premises deployment
- Cloud-independent operation
- Full control over data

- **E2E Data Processing and Intelligent Annotation System:**
- Customizable data processing pipelines
- Speed up processing with parallel computing
- Collaborate on data annotation tasks

- **Resilient High-Availability Architecture:**
- Support for high-availability architecture
- Support for load balancing and resource scheduling to ensure stability
- Support for disaster recovery to ensure business continuity

## CSGHub Tech Design

We have built CSGHub on proven technologies to make it both powerful and reliable:

- CSGHub integrates Git server with Git LFS protocol and Object Storage Service to create a reliable storage system. This integration provides flexible access to development tools while maintaining comprehensive data management capabilities.
- Built on a service-oriented architecture, CSGHub features a backend server (CSGHub Server) and web interface (CSGHub Portal) for seamless management. Users can quickly deploy services with Docker Compose or Kubernetes Helm Chart, while developers can extend functionality through the CSGHub Server for custom integrations and advanced features.
- By leveraging Apache Arrow and DuckDB, CSGHub makes dataset management straightforward for both researchers and common users, offering convenient Parquet file preview functionality for local datasets.
- The platform features a clean web interface with enterprise-ready permissions system. Users can easily manage version control, browse files online, configure access settings, and participate in discussions about models and datasets.

Our R&D team focuses on AI + DevOps, aiming to address the challenges in LLM development through the CSGHub project. We welcome your contributions to enrich the documentation and strengthen the platform's capabilities.

## CSGHub Architecture

This repository contains the CSGHub Portal code, while CSGHub Server is developed as a separate high-performance backend project in Golang.

To understand the details of CSGHub Server or learn how to integrate it with your own frontend system, check out the [CSGHub Server open-source project](https://github.com/OpenCSGs/csghub-server).

### CSGHub Portal Architecture

<img src="images/portal_tech_graph.png" width='800'>

### CSGHub Server Architecture

<img src="images/server_tech_graph.png" width='800'>
57 changes: 57 additions & 0 deletions docs/detailed_intro_zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# CSGHub 功能与架构详解

在大语言模型(LLM)时代,数据和模型已成为企业及个人用户的核心数字资产。然而,管理工具的分散、管理方式的局限性以及本地化等问题,不仅威胁到安全运营,还可能影响企业级模型的更新迭代。如果您相信大模型将成为未来变革的关键驱动力,那么如何高效、安全地管理模型、数据及大模型应用代码等核心资产,可能正是您关注的问题。CSGHub 正是为解决这些问题而设计的开源项目。

## CSGHub 核心功能

- **统一管理大模型资产**
提供一站式模型、数据集和代码的管理,支持存储、版本控制、修改和查询。支持超大文件的上传与下载,以及 Web 端在线编辑和预览,提升开发协作效率。

- **灵活兼容的开发生态系统**
支持 HTTPS 和 SSH 协议,方便用户通过 Git 命令或网页界面进行操作。平台集成 Gradio 和 Streamlit 等热门 SDK,简化 AI 应用开发,并提供一键模型推理和微调服务,自动优化环境,确保高效部署和运行。

- **大模型能力扩展**
支持全面的版本管理、模型格式转换和自动数据处理,以及 CSV、JSON 等常见数据格式的转换工具,并在 Web 端预览 Parquet 格式数据集,方便用户快速检查数据。

- **应用空间与资产管理助手(Copilot)**
用户可通过应用空间展示模型能力,搭建应用原型,并进行灵活的增删改查操作。Copilot 助手简化资产管理,并提供私有化版本以支持企业本地部署。

- **多源数据同步与推荐**
与 OpenCSG 社区集成,支持同步社区中模型和数据集,并根据业务场景提供个性化推荐,助力企业开发适配场景的 AI 方案。

- **完善的权限与安全管控**
支持与企业用户系统集成、支持资产可见范围设置、外内部接口鉴权设计,并通过许可证合规性检查和溯源确保模型符合法律要求。

- **支持私有化部署**
无需依赖云服务即可一键实现私有化部署,保障企业数据的自主控制和安全。

- **一站式数据处理与智能标注系统**
提供可定制的数据处理 Pipeline,支持复杂数据清洗与转换,并利用并行处理加速任务。还具备智能标注系统,支持多用户协作和审核,确保数据质量。

- **高可用与灾难恢复设计**
采用高可用系统架构,支持负载均衡和资源调度,确保高并发下的稳定性。通过冗余备份和快照技术实现灾难恢复,保障业务连续性。

## CSGHub 技术方案

CSGHub 的技术方案包括以下内容:

- CSGHub 集成了 Git 服务器、Git LFS(大文件存储)协议和对象存储服务(OSS),提供了稳固的数据存储层、灵活的基础设施访问层,并对开发工具提供全面支持。
- CSGHub 采用面向服务的架构,通过 CSGHub Server 提供后端服务,并通过 CSGHub Web Service 提供管理界面。普通用户可以使用 Docker Compose 或 Kubernetes Helm Chart 快速启动服务,实现企业级资产管理。而具备开发能力的用户可利用 CSGHub Server 进行二次开发,将管理功能集成到外部系统或自定义更多高级功能。
- 利用 Apache Arrow 和 DuckDB 等出色的开源项目,CSGHub 支持 Parquet 数据文件格式的预览,方便研究人员和普通用户管理本地化数据集。
- CSGHub 提供简洁直观的 Web 界面和符合企业组织结构的权限设计。用户可以通过 Web UI 实现版本控制、在线浏览和下载,设置数据集和模型文件的可见范围,实现数据隔离安全,还可以围绕模型和数据集发起主题讨论。

我们的研发团队专注于 AI + DevOps,希望通过 CSGHub 项目解决大模型开发中的难点。我们鼓励大家提供优质的开发和运维文档,共同完善平台,打造更加丰富、高效的大模型资产。

## CSGHub 架构设计

CSGHub 包含两个核心部分:门户(Portal)和服务器(Server)。此代码库对应 CSGHub Portal,而 CSGHub Server 则是用 Golang 实现的高性能后端项目。

如果您想深入了解 CSGHub Server 的架构细节,或希望将 Server 与您的前端系统集成,可以查看 [CSGHub Server 开源项目](https://github.com/OpenCSGs/csghub-server)

### CSGHub Portal 架构

<img src="images/portal_tech_graph.png" width='800'>

### CSGHub Server 架构

<img src="images/server_tech_graph.png" width='800'>

0 comments on commit 95dfa82

Please sign in to comment.