SingVisio release (#141)

Release the first visualization tool of Amphion, SingVisio
open-mmlab · Feb 23, 2024 · d37d8f1 · d37d8f1
1 parent 1c4a9af
commit d37d8f1
Show file tree

Hide file tree

Showing 21 changed files with 3,327 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -33,7 +33,7 @@ Here is the Amphion v0.1 demo, whose voice, audio effects, and singing voice are
 )
 
 ## 🚀 News
-
+- **2024/02/22**: The first Amphion visualization tool, **SingVisio**, release. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2402.12660) [![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/SingVisio) [![Video](https://img.shields.io/badge/Video-Demo-orange)](https://drive.google.com/file/d/1w5xgsfaLxBcUvzq3rgejZ6jfgu6hwC0c/view) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/visualization/SingVisio/README.md)
 - **2023/12/18**: Amphion v0.1 release. [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2312.09911) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Amphion-pink)](https://huggingface.co/amphion) [![youtube](https://img.shields.io/badge/YouTube-Demo-red)](https://www.youtube.com/watch?v=1aw0HhcggvQ) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](https://github.com/open-mmlab/Amphion/pull/39)
 - **2023/11/28**: Amphion alpha release. [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](https://github.com/open-mmlab/Amphion/pull/2)
 
@@ -79,6 +79,13 @@ Amphion provides a comprehensive objective evaluation of the generated audio. Th
 
 Amphion unifies the data preprocess of the open-source datasets including [AudioCaps](https://audiocaps.github.io/), [LibriTTS](https://www.openslr.org/60/), [LJSpeech](https://keithito.com/LJ-Speech-Dataset/), [M4Singer](https://github.com/M4Singer/M4Singer), [Opencpop](https://wenet.org.cn/opencpop/), [OpenSinger](https://github.com/Multi-Singer/Multi-Singer.github.io), [SVCC](http://vc-challenge.org/), [VCTK](https://datashare.ed.ac.uk/handle/10283/3443), and more. The supported dataset list can be seen [here](egs/datasets/README.md) (updating).
 
+### Visualization
+
+Amphion provides visualization tools to interactively illustrate the internal processing mechanism of classic models. This provides an invaluable resource for educational purposes and for facilitating understandable research.
+
+Currently, Amphion supports [SingVisio](egs/visualization/SingVisio/README.md), a visualization tool of the diffusion model for singing voice conversion. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2402.12660) [![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/SingVisio) [![Video](https://img.shields.io/badge/Video-Demo-orange)](https://drive.google.com/file/d/1w5xgsfaLxBcUvzq3rgejZ6jfgu6hwC0c/view)
+
+
 ## 📀 Installation
 
 Amphion can be installed through either Setup Installer or Docker Image.
@@ -121,6 +128,7 @@ We detail the instructions of different tasks in the following recipes:
 - [Text to Audio (TTA)](egs/tta/README.md)
 - [Vocoder](egs/vocoder/README.md)
 - [Evaluation](egs/metrics/README.md)
+- [Visualization](egs/visualization/README.md)
 
 ## 👨‍💻 Contributing
 We appreciate all contributions to improve Amphion. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
@@ -146,9 +154,9 @@ Amphion is under the [MIT License](LICENSE). It is free for both research and co
 ```bibtex
 @article{zhang2023amphion,
       title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit}, 
-      author={Xueyao Zhang and Liumeng Xue and Yuancheng Wang and Yicheng Gu and Xi Chen and Zihao Fang and Haopeng Chen and Lexiao Zou and Chaoren Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},
+      author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Haorui He and Chaoren Wang and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},
       journal={arXiv},
-      year={2023},
+      year={2024},
       volume={abs/2312.09911}
 }
 ```
diff --git a/egs/visualization/README.md b/egs/visualization/README.md
@@ -0,0 +1,19 @@
+# Amphion Visualization Recipe
+
+## Quick Start
+
+We provides a **[beginner recipe](SingVisio/)** to demonstrate how to implement interactive visualization for classic audio, music and speech generative models. Specifically, it is also an official implementation of the paper "[SingVisio: SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion](https://arxiv.org/pdf/2402.12660.pdf)". The **SingVisio** can be experienced [here](https://openxlab.org.cn/apps/detail/Amphion/SingVisio).
+
+## Supported Models
+
+As the unique feature of Amphion, visualization aims to introduce interactive visual analysis of some classical models for educational purposes, helping newcomers understand their inner workings. 
+
+Until now, Amphion has supported the visualization tool for the following models:
+
+- **SVC**:
+    - **[MultipleContentsSVC](../svc/MultipleContentsSVC)**: A diffusion-based model for sining voice conversion
+- **TTS**:
+    - **[FastSpeech 2](../tts/FastSpeech2/)** (👨‍💻 developing): A typical transformer-based TTS model.
+    - **[VITS](../tts/VITS/)** (👨‍💻 developing): A typical flow-based end-to-end TTS model.
+
+
diff --git a/egs/visualization/SingVisio/README.md b/egs/visualization/SingVisio/README.md
@@ -0,0 +1,71 @@
+# SingVisio: Visual Analytics of the Diffusion Model for Singing Voice Conversion
+
+[![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2402.12660)
+[![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/SingVisio)
+[![Video](https://img.shields.io/badge/Video-Demo-orange)](https://drive.google.com/file/d/1w5xgsfaLxBcUvzq3rgejZ6jfgu6hwC0c/view)
+
+<div align="center">
+<img src="../../../imgs/visualization/SingVisio_system.png" width="85%">
+</div>
+
+This is the official implementation of the paper "[SingVisio: Visual Analytics of the Diffusion Model for Singing Voice Conversion](https://arxiv.org/abs/2402.12660)."  **SingVisio** system can be experienced [here](https://openxlab.org.cn/apps/detail/Amphion/SingVisio). 
+
+**SingVisio** system comprises two main components: a web-based front-end user interface and a back-end generation model.
+
+- The web-based user interface was developed using [D3.js](https://d3-graph-gallery.com/index.html), a JavaScript library designed for creating dynamic and interactive data visualizations. The code can be accessed [here](../../../visualization/SingVisio/webpage/).
+- The core generative model, [MultipleContentsSVC](https://arxiv.org/abs/2310.11160), is a diffusion-based model tailored for singing voice conversion (SVC). The code for this model is available in Amphion, with the recipe accessible [here](../../svc/MultipleContentsSVC/).
+
+## Development Workflow for Visualization Systems
+
+The process of developing a visualization system encompasses seven key steps:
+
+1. **Identify the Model for Visualization**: Begin by selecting the model you wish to visualize. 
+
+2. **Task Analysis**: Analyze the specific tasks that the visualization system needs to support through discussions with experts, model builders, and potential users. It means to determine what you want to visualize, such as the classical denoising generation process in diffusion models.
+
+3. **Data and Feature Generation**: Produce the data and features necessary for visualization based on the selected model. Alternatively, you can also generate and visualize them in real time.
+
+4. **Design the User Interface**: Design and develop the user interface to effectively display the model structure, data, and features. 
+
+5. **Iterative Refinement**: Iteratively refine the user interface design for a better visualization experience. 
+
+6. **User Study Preparation**: Design questionnaires for a user study to evaluate the system in terms of system design, functionality, explainability, and user-friendliness.
+
+7. **Evaluation and Improvement**: Conduct comprehensive evaluations through a user study, case study, and expert study to evaluate, analyze, and improve the system.
+
+
+## Tasks Supported in SingVisio
+
+There are five tasks in **SingVisio** System.
+- To investigate the evolution and quality of the converted SVC results from each step in the diffusion generation process, **SingVisio** supports the following two tasks:
+    - **T1: Step-wise Diffusion Generation Comparison:** Investigate the evolution and quality of results converted at each step of the diffusion process.
+    - **T2: Step-wise Metric Comparison:** Examine changes in metrics throughout the diffusion steps.
+
+- To explore how various factors (content, melody, singer timbre) influence the SVC results, **SingVisio** supports the following three tasks:
+    - **T3: Pair-wise SVC Comparison with Different <u>Target Singers</u>**
+    - **T4: Pair-wise SVC Comparison with Different <u>Source Singers</u>**
+    - **T5: Pair-wise SVC Comparison with Different <u>Songs</u>**
+
+## View Design in SingVisio
+
+The user inference of **SingVisio** is comprised of five views:
+- **A: Control Panel:** Enables users to adjust the display mode and select data for visual analysis.
+- **B: Step View:** Offers an overview of the diffusion generation process.
+- **C: Comparison View:** Facilitates easy comparison of conversion results under different conditions.
+- **D: Projection View:** Assists in observing the diffusion steps' trajectory with or without conditions.
+- **E: Metric View:** Displays objective metrics evaluated on the diffusion-based SVC model, allowing for interactive examination of metric trends across diffusion steps.
+
+## Detailed System Introduction of SingVisio
+
+For a detailed introduction to **SingVisio** and user instructions, please refer to [this online document](https://x8gvg3n7v3.feishu.cn/docx/IMhUdqIFVo0ZjaxlBf6cpjTEnvf?from=from_copylink) (with animation) or [offline document](../../../visualization/SingVisio/System_Introduction_of_SingVisio.pdf) (without animation).
+
+Additionally, explore the SingVisio demo to see the system's functionalities and usage in action.
+
+<a href="https://drive.google.com/file/d/1w5xgsfaLxBcUvzq3rgejZ6jfgu6hwC0c/view?usp=sharing">
+   <img src="../../../imgs/visualization/SingVisio_demo.png" alt="Watch the video" style="width:100%;">
+</a>
+
+
+## User Study of SingVisio
+
+Participate in the [user study](https://www.wjx.cn/vm/wkIH372.aspx#) of **SingVisio** if you're interested. We encourage you to conduct the study after experiencing the **SingVisio** system. Your valuable feedback is greatly appreciated.
diff --git a/imgs/visualization/SingVisio_demo.png b/imgs/visualization/SingVisio_demo.png
diff --git a/imgs/visualization/SingVisio_system.png b/imgs/visualization/SingVisio_system.png
diff --git a/visualization/SingVisio/System_Introduction_of_SingVisio.pdf b/visualization/SingVisio/System_Introduction_of_SingVisio.pdf
diff --git a/visualization/SingVisio/webpage/Dockerfile b/visualization/SingVisio/webpage/Dockerfile
@@ -0,0 +1,17 @@
+# Copyright (c) 2023 Amphion.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+FROM python:3.10
+
+WORKDIR /app
+
+COPY . .
+
+EXPOSE 8000
+
+ENTRYPOINT [ "python -m http.server 8000" ]
+
+# docker build -t diffsvc .
+# docker run -v $(pwd)/data:/app/data -p 8000:8000 diffsvc
diff --git a/visualization/SingVisio/webpage/README.md b/visualization/SingVisio/webpage/README.md
@@ -0,0 +1,105 @@
+## SingVisio Webpage
+
+This is the source code of the SingVisio Webpage. This README file will introduce the project and provide an installation guide.
+
+### Tech stack
+
+- [Tailwind CSS](https://tailwindcss.com/)
+- [Flowbite](https://flowbite.com/)
+- [D3.js](https://d3js.org/)
+- [Driver.js](https://driverjs.com/)
+
+
+### Structure
+
+- `index.html`: entry point file
+- `config`: JSON file loaded in `index.html`
+- `img`: image files
+- `resources`: CSS style and JavaScript files
+    - `init.js`: load config and initialize variables
+    - `function.js`: functions used in this project
+    - `event.js`: bind webpage mouse and keyboard events to function
+
+
+### Configuration
+
+Before installation, the data path must be configured in the file `config/default.json`. 
+
+```json
+{
+    "pathData": {
+        "<mode_name>": { // support multiple modes
+            "multi": ["<id>"], // song_id, sourcesinger_id, or target_id. Set false to disable. Enable multiple choice for the configed checkbox.
+            "curve": true, // set true if need the metric curve
+            "referenceMap": { // config reference path when enable multiple choice.
+                "<sourcesinger_id>": [ // e.g. m4singer_Tenor-6
+                    "<path_to_wav>", // e.g. Tenor-6_寂寞沙洲冷_0002
+                ]
+            },
+            "data": [
+                { // support multiple datasets
+                    "dataset": "<dataset_name>",
+                    "basePath": "<path_to_the_processed_data>",
+                    "pathMap": {
+                        "<sourcesinger_id>": {
+                            "songs": [
+                                "<song_id>" // set song id, support multiple ids
+                            ],
+                            "targets": [
+                                "<target_id>" // set target singer id, support multiple ids
+                            ]
+                        }
+                    }
+                }
+            ]
+        }
+    },
+    "mapToName": {
+        "<map_from>": "<map_to>"
+    },
+    "mapToSong": {
+        "<map_from>": "<map_to>"
+    },
+    "mapToSpace": {
+        "<map_from>": "<map_to>"
+    },
+    "picTypes": [
+        "<pic_type>" // support multiple types
+    ],
+    "evaluation_data": [
+        { // support multiple data
+            "target": "<target_id>",
+            "sourcesinger": "<sourcesinger_id>",
+            "song": "<song_id>",
+            "best": [
+                "<best_metric>" // activate this when click which metric
+            ]
+        },
+    ],
+    "colorList": [
+        "<color_hex_code>"// support multiple colors
+    ],
+    "histogramData": [
+        { // displayed at top left graph
+            "type": "high", // high or low. high: the higher, the better.
+            "name": "<mertic_name>",
+            "value": <metric_value>
+        }
+    ]
+}
+```
+
+
+### Installation
+
+This project does not need to be built. There are multiple ways to run this project. Here, we will introduce the simplest way:
+
+1. Install Python and run the following code to start the HTTP server:
+
+```bash
+cd webpage
+python -m http.server 8080
+```
+
+2. After starting the web server, enter the link in the browser: [http://localhost:8080/](http://localhost:8080/)
+