A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.
In this paper, we present Msanii, a novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently. Our model combines the expressiveness of mel spectrograms, the generative capabilities of diffusion models, and the vocoding capabilities of neural vocoders. We demonstrate the effectiveness of Msanii by synthesizing tens of seconds (190 seconds) of stereo music at high sample rates (44.1 kHz) without the use of concatenative synthesis, cascading architectures, or compression techniques. To the best of our knowledge, this is the first work to successfully employ a diffusion-based model for synthesizing such long music samples at high sample rates. Our demo can be found here and our code here.
This is a work in progress and has not been finalized. The results and approach presented are subject to change and should not be considered final.
See more here.
Midnight Melodies | Echoes of Yesterday |
---|---|
Rainy Day Reflections | Starlight Sonatas |
Setup your virtual environment using conda or venv.
pip install -q git+https://github.com/Kinyugo/msanii.git
git clone https://github.com/Kinyugo/msanii.git
cd msanii
pip install -q -r requirements.txt
pip install -e .
To train via CLI you need to define a config file. Check for sample config files within the conf
directory.
wandb login
python -m msanii.scripts.training <path-to-your-config.yml-file>
Msanii supports the following inference tasks:
- sampling
- audio2audio
- interpolation
- inpainting
- outpainting
Each task requires a different config file. Check conf
directory for samples.
gdown 1G9kF0r5vxYXPSdSuv4t3GR-sBO8xGFCe # model checkpoint
python -m msanii.scripts.inference <task> <path-to-your-config.yml-file>
To run the demo via CLI you need to define a config file. Check for sample config files within the conf
directory.
gdown 1G9kF0r5vxYXPSdSuv4t3GR-sBO8xGFCe # model checkpoint
python -m msanii.demo.demo <path-to-your-config.yml-file>
We are always looking for ways to improve and expand our project, and we welcome contributions from the community. Here are a few ways you can get involved:
- Bug Fixes and Feature Requests: If you find any issues with the project, please open a GitHub issue or submit a pull request with a fix.
- Data Collection: We are always in need of more data to improve the performance of our models. If you have any relevant data that you would like to share, please let us know.
- Feedback: We value feedback from our users and would love to hear your thoughts on the project. Please feel free to reach out to us with any suggestions or comments.
- Funding: If you find our project helpful, consider supporting us through GitHub Sponsors. Your support will help us continue to maintain and improve the project.
- Computational Resources: If you have access to computational resources such as GPU clusters, you can help us by providing access to these resources to run experiments and improve the project.
- Code Contributions: If you are a developer and want to contribute to the codebase, feel free to open a pull request.
- Documentation: If you have experience with documentation and want to help improve the project's documentation please let us know.
- Promotion: Help increase the visibility and attract more contributors by sharing the project with your friends, colleagues, and on social media.
- Educational Material: If you are an educator or content creator you can help by creating tutorials, guides or educational material that can help others understand the project better.
- Discussing and Sharing Ideas: Even if you don't have the time or technical skills to contribute directly to the code or documentation, you can still help by sharing and discussing ideas with the community. This can help identify new features or use cases, or find ways to improve existing ones.
- Ethical Review: Help us ensure that the project follows ethical standards by reviewing data and models for potential infringements. Additionally, please do not use the project or its models to train or generate copyrighted works without proper authorization.