-
Notifications
You must be signed in to change notification settings - Fork 590
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release the first visualization tool of Amphion, SingVisio
- Loading branch information
Showing
21 changed files
with
3,327 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Amphion Visualization Recipe | ||
|
||
## Quick Start | ||
|
||
We provides a **[beginner recipe](SingVisio/)** to demonstrate how to implement interactive visualization for classic audio, music and speech generative models. Specifically, it is also an official implementation of the paper "[SingVisio: SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion](https://arxiv.org/pdf/2402.12660.pdf)". The **SingVisio** can be experienced [here](https://openxlab.org.cn/apps/detail/Amphion/SingVisio). | ||
|
||
## Supported Models | ||
|
||
As the unique feature of Amphion, visualization aims to introduce interactive visual analysis of some classical models for educational purposes, helping newcomers understand their inner workings. | ||
|
||
Until now, Amphion has supported the visualization tool for the following models: | ||
|
||
- **SVC**: | ||
- **[MultipleContentsSVC](../svc/MultipleContentsSVC)**: A diffusion-based model for sining voice conversion | ||
- **TTS**: | ||
- **[FastSpeech 2](../tts/FastSpeech2/)** (👨💻 developing): A typical transformer-based TTS model. | ||
- **[VITS](../tts/VITS/)** (👨💻 developing): A typical flow-based end-to-end TTS model. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# SingVisio: Visual Analytics of the Diffusion Model for Singing Voice Conversion | ||
|
||
[![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2402.12660) | ||
[![openxlab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Amphion/SingVisio) | ||
[![Video](https://img.shields.io/badge/Video-Demo-orange)](https://drive.google.com/file/d/1w5xgsfaLxBcUvzq3rgejZ6jfgu6hwC0c/view) | ||
|
||
<div align="center"> | ||
<img src="../../../imgs/visualization/SingVisio_system.png" width="85%"> | ||
</div> | ||
|
||
This is the official implementation of the paper "[SingVisio: Visual Analytics of the Diffusion Model for Singing Voice Conversion](https://arxiv.org/abs/2402.12660)." **SingVisio** system can be experienced [here](https://openxlab.org.cn/apps/detail/Amphion/SingVisio). | ||
|
||
**SingVisio** system comprises two main components: a web-based front-end user interface and a back-end generation model. | ||
|
||
- The web-based user interface was developed using [D3.js](https://d3-graph-gallery.com/index.html), a JavaScript library designed for creating dynamic and interactive data visualizations. The code can be accessed [here](../../../visualization/SingVisio/webpage/). | ||
- The core generative model, [MultipleContentsSVC](https://arxiv.org/abs/2310.11160), is a diffusion-based model tailored for singing voice conversion (SVC). The code for this model is available in Amphion, with the recipe accessible [here](../../svc/MultipleContentsSVC/). | ||
|
||
## Development Workflow for Visualization Systems | ||
|
||
The process of developing a visualization system encompasses seven key steps: | ||
|
||
1. **Identify the Model for Visualization**: Begin by selecting the model you wish to visualize. | ||
|
||
2. **Task Analysis**: Analyze the specific tasks that the visualization system needs to support through discussions with experts, model builders, and potential users. It means to determine what you want to visualize, such as the classical denoising generation process in diffusion models. | ||
|
||
3. **Data and Feature Generation**: Produce the data and features necessary for visualization based on the selected model. Alternatively, you can also generate and visualize them in real time. | ||
|
||
4. **Design the User Interface**: Design and develop the user interface to effectively display the model structure, data, and features. | ||
|
||
5. **Iterative Refinement**: Iteratively refine the user interface design for a better visualization experience. | ||
|
||
6. **User Study Preparation**: Design questionnaires for a user study to evaluate the system in terms of system design, functionality, explainability, and user-friendliness. | ||
|
||
7. **Evaluation and Improvement**: Conduct comprehensive evaluations through a user study, case study, and expert study to evaluate, analyze, and improve the system. | ||
|
||
|
||
## Tasks Supported in SingVisio | ||
|
||
There are five tasks in **SingVisio** System. | ||
- To investigate the evolution and quality of the converted SVC results from each step in the diffusion generation process, **SingVisio** supports the following two tasks: | ||
- **T1: Step-wise Diffusion Generation Comparison:** Investigate the evolution and quality of results converted at each step of the diffusion process. | ||
- **T2: Step-wise Metric Comparison:** Examine changes in metrics throughout the diffusion steps. | ||
|
||
- To explore how various factors (content, melody, singer timbre) influence the SVC results, **SingVisio** supports the following three tasks: | ||
- **T3: Pair-wise SVC Comparison with Different <u>Target Singers</u>** | ||
- **T4: Pair-wise SVC Comparison with Different <u>Source Singers</u>** | ||
- **T5: Pair-wise SVC Comparison with Different <u>Songs</u>** | ||
|
||
## View Design in SingVisio | ||
|
||
The user inference of **SingVisio** is comprised of five views: | ||
- **A: Control Panel:** Enables users to adjust the display mode and select data for visual analysis. | ||
- **B: Step View:** Offers an overview of the diffusion generation process. | ||
- **C: Comparison View:** Facilitates easy comparison of conversion results under different conditions. | ||
- **D: Projection View:** Assists in observing the diffusion steps' trajectory with or without conditions. | ||
- **E: Metric View:** Displays objective metrics evaluated on the diffusion-based SVC model, allowing for interactive examination of metric trends across diffusion steps. | ||
|
||
## Detailed System Introduction of SingVisio | ||
|
||
For a detailed introduction to **SingVisio** and user instructions, please refer to [this online document](https://x8gvg3n7v3.feishu.cn/docx/IMhUdqIFVo0ZjaxlBf6cpjTEnvf?from=from_copylink) (with animation) or [offline document](../../../visualization/SingVisio/System_Introduction_of_SingVisio.pdf) (without animation). | ||
|
||
Additionally, explore the SingVisio demo to see the system's functionalities and usage in action. | ||
|
||
<a href="https://drive.google.com/file/d/1w5xgsfaLxBcUvzq3rgejZ6jfgu6hwC0c/view?usp=sharing"> | ||
<img src="../../../imgs/visualization/SingVisio_demo.png" alt="Watch the video" style="width:100%;"> | ||
</a> | ||
|
||
|
||
## User Study of SingVisio | ||
|
||
Participate in the [user study](https://www.wjx.cn/vm/wkIH372.aspx#) of **SingVisio** if you're interested. We encourage you to conduct the study after experiencing the **SingVisio** system. Your valuable feedback is greatly appreciated. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Copyright (c) 2023 Amphion. | ||
# | ||
# This source code is licensed under the MIT license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
|
||
FROM python:3.10 | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
EXPOSE 8000 | ||
|
||
ENTRYPOINT [ "python -m http.server 8000" ] | ||
|
||
# docker build -t diffsvc . | ||
# docker run -v $(pwd)/data:/app/data -p 8000:8000 diffsvc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
## SingVisio Webpage | ||
|
||
This is the source code of the SingVisio Webpage. This README file will introduce the project and provide an installation guide. | ||
|
||
### Tech stack | ||
|
||
- [Tailwind CSS](https://tailwindcss.com/) | ||
- [Flowbite](https://flowbite.com/) | ||
- [D3.js](https://d3js.org/) | ||
- [Driver.js](https://driverjs.com/) | ||
|
||
|
||
### Structure | ||
|
||
- `index.html`: entry point file | ||
- `config`: JSON file loaded in `index.html` | ||
- `img`: image files | ||
- `resources`: CSS style and JavaScript files | ||
- `init.js`: load config and initialize variables | ||
- `function.js`: functions used in this project | ||
- `event.js`: bind webpage mouse and keyboard events to function | ||
|
||
|
||
### Configuration | ||
|
||
Before installation, the data path must be configured in the file `config/default.json`. | ||
|
||
```json | ||
{ | ||
"pathData": { | ||
"<mode_name>": { // support multiple modes | ||
"multi": ["<id>"], // song_id, sourcesinger_id, or target_id. Set false to disable. Enable multiple choice for the configed checkbox. | ||
"curve": true, // set true if need the metric curve | ||
"referenceMap": { // config reference path when enable multiple choice. | ||
"<sourcesinger_id>": [ // e.g. m4singer_Tenor-6 | ||
"<path_to_wav>", // e.g. Tenor-6_寂寞沙洲冷_0002 | ||
] | ||
}, | ||
"data": [ | ||
{ // support multiple datasets | ||
"dataset": "<dataset_name>", | ||
"basePath": "<path_to_the_processed_data>", | ||
"pathMap": { | ||
"<sourcesinger_id>": { | ||
"songs": [ | ||
"<song_id>" // set song id, support multiple ids | ||
], | ||
"targets": [ | ||
"<target_id>" // set target singer id, support multiple ids | ||
] | ||
} | ||
} | ||
} | ||
] | ||
} | ||
}, | ||
"mapToName": { | ||
"<map_from>": "<map_to>" | ||
}, | ||
"mapToSong": { | ||
"<map_from>": "<map_to>" | ||
}, | ||
"mapToSpace": { | ||
"<map_from>": "<map_to>" | ||
}, | ||
"picTypes": [ | ||
"<pic_type>" // support multiple types | ||
], | ||
"evaluation_data": [ | ||
{ // support multiple data | ||
"target": "<target_id>", | ||
"sourcesinger": "<sourcesinger_id>", | ||
"song": "<song_id>", | ||
"best": [ | ||
"<best_metric>" // activate this when click which metric | ||
] | ||
}, | ||
], | ||
"colorList": [ | ||
"<color_hex_code>"// support multiple colors | ||
], | ||
"histogramData": [ | ||
{ // displayed at top left graph | ||
"type": "high", // high or low. high: the higher, the better. | ||
"name": "<mertic_name>", | ||
"value": <metric_value> | ||
} | ||
] | ||
} | ||
``` | ||
|
||
|
||
### Installation | ||
|
||
This project does not need to be built. There are multiple ways to run this project. Here, we will introduce the simplest way: | ||
|
||
1. Install Python and run the following code to start the HTTP server: | ||
|
||
```bash | ||
cd webpage | ||
python -m http.server 8080 | ||
``` | ||
|
||
2. After starting the web server, enter the link in the browser: [http://localhost:8080/](http://localhost:8080/) | ||
|
Oops, something went wrong.