Skip to content

Latest commit

 

History

History
115 lines (90 loc) · 4.76 KB

README.md

File metadata and controls

115 lines (90 loc) · 4.76 KB
 
Click here to expand/collapse content
    ## Introduction MeloTTS is a **high-quality multi-lingual** text-to-speech library by [MIT](https://www.mit.edu/) and [MyShell.ai](https://myshell.ai). Supported languages include:
    Language Example
    English (American) Link
    English (British) Link
    English (Indian) Link
    English (Australian) Link
    English (Default) Link
    Spanish Link
    French Link
    Chinese (mix EN) Link
    Japanese Link
    Korean Link

    Some other features include:

    • The Chinese speaker supports mixed Chinese and English.
    • Fast enough for CPU real-time inference.

    Usage

    The Python API and model cards can be found in this repo or on HuggingFace.

    Join the Community

    Discord

    Join our Discord community and select the Developer role upon joining to gain exclusive access to our developer-only channel! Don't miss out on valuable discussions and collaboration opportunities.

    Contributing

    If you find this work useful, please consider contributing to this repo.

    • Many thanks to @fakerybakery for adding the Web UI and CLI part.

    Authors

    Citation

    @software{zhao2024melo,
      author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
      title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
      url = {https://github.com/myshell-ai/MeloTTS},
      year = {2023}
    }
    

    License

    This library is under MIT License, which means it is free for both commercial and non-commercial use.

    Acknowledgements

    This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.

Update Notes

2024/08/21

  1. MeloTTS model supports using openvino to accelerate the inference process. Currently only verified on Linux system.

2024/08/28

  1. TTS and Bert model support int8 quantize.

Install MeloTTS with OpenVINO™

pip install -r requirements.txt
pip install openvino nncf
python setup.py develop # or  pip install -e .
python -m unidic download

Convert MeloTTS model to OpenVINO™ IR(Intermediate Representation) and testing:

python3  test_tts.py

Todo:

  1. Now the input will be split and processed serially. This can be optimized to use openvino asynchronous inference, like this:
  ...
    self.tts_model = self.core.read_model(Path(ov_model_path))
    self.tts_compiled_model = self.core.compile_model(self.tts_model, 'CPU')
    self.tts_request_0 = self.tts_compiled_model.create_infer_request()
    self.tts_request_1 = self.tts_compiled_model.create_infer_request()
  ...
    for index, t in enumerate(texts):
      ...
        if index == 0:
          self.tts_request_0.start_async(inputs_dict, share_inputs=True)
        elif index ==1 :
          self.tts_request_1.start_async(inputs_dict, share_inputs=True)
      ...
    self.tts_request_0.wait()
    self.tts_request_1.wait()
  ...