-
Notifications
You must be signed in to change notification settings - Fork 550
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Per repo loading + final cosmetic and doc changes (#40)
* fix * missing * wip * adapting rust * plop * model list * improvs * fix rust * fixing bug in vq and streaming
- Loading branch information
Showing
15 changed files
with
206 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,13 +49,32 @@ There are three separate versions of the moshi inference stack in this repo. | |
- The python version using PyTorch is in the [`moshi/`](moshi/) directory. | ||
- The python version using MLX for M series Macs is in the [`moshi_mlx/`](moshi_mlx/) directory. | ||
- The rust version used in production is in the [`rust/`](rust/) directory. | ||
This contains in particular a Mimi implementation in Rust, with Python bindings available | ||
as `rustymimi`. | ||
|
||
Finally, the code for the live demo is provided in the [`client/`](client/) directory. | ||
|
||
|
||
## Models | ||
|
||
We release three models: | ||
- our speech codec Mimi, | ||
- Moshi fine-tuned on a male synthetic voice (Moshiko), | ||
- Moshi fine-tuned on a female synthetic voice (Moshika). | ||
|
||
Depending on the backend, the file format and quantization available will vary. Here is the list | ||
of the HuggingFace repo with each model. Mimi is bundled in any of those, and always use the same checkpoint format. | ||
|
||
- Moshika for PyTorch (bf16): [kmhf/moshika-pytorch-bf16](https://huggingface.co/kmhf/moshika-pytorch-bf16). | ||
- Moshiko for PyTorch (bf16): [kmhf/moshiko-pytorch-bf16](https://huggingface.co/kmhf/moshiko-pytorch-bf16). | ||
- Moshika for MLX (int4, int8, bf16): [kmhf/moshiko-mlx-q4](https://huggingface.co/kmhf/moshika-mlx-q4), [kmhf/moshiko-mlx-q8](https://huggingface.co/kmhf/moshika-mlx-q8), [kmhf/moshiko-mlx-bf16](https://huggingface.co/kmhf/moshika-mlx-bf16). | ||
- Moshiko for MLX (int4, int8, bf16): [kmhf/moshiko-mlx-q4](https://huggingface.co/kmhf/moshiko-mlx-q4), [kmhf/moshiko-mlx-q8](https://huggingface.co/kmhf/moshiko-mlx-q8), [kmhf/moshiko-mlx-bf16](https://huggingface.co/kmhf/moshiko-mlx-bf16). | ||
- Moshiko for Rust/Candle (int8, bf16): [kmhf/moshika-candle-q8](https://huggingface.co/kmhf/moshika-candle-q8), [kmhf/moshiko-mlx-bf16](https://huggingface.co/kmhf/moshika-candle-bf16). | ||
- Moshiko for Rust/Candle (int8, bf16): [kmhf/moshiko-candle-q8](https://huggingface.co/kmhf/moshiko-candle-q8), [kmhf/moshiko-mlx-bf16](https://huggingface.co/kmhf/moshiko-candle-bf16). | ||
|
||
## Requirements | ||
|
||
You will need at least Python 3.10. For using the rust backend, you will need a recent version of | ||
the [Rust toolchain](https://rustup.rs/). For specific requirements, please check the individual backends | ||
You will need at least Python 3.10. For specific requirements, please check the individual backends | ||
directories. You can install the PyTorch and MLX clients with the following: | ||
|
||
```bash | ||
|
@@ -64,41 +83,54 @@ pip install moshi_mlx # moshi MLX, from PyPI | |
# Or the bleeding edge versions for Moshi and Moshi-MLX. | ||
pip install -e "git+https://[email protected]/kyutai-labs/moshi.git#egg=moshi&subdirectory=moshi" | ||
pip install -e "git+https://[email protected]/kyutai-labs/moshi.git#egg=moshi_mlx&subdirectory=moshi_mlx" | ||
|
||
pip install rustymimi # mimi, rust implementation with Python bindings from PyPI | ||
``` | ||
|
||
While we hope that the present codebase will work on Windows, we do not provide official support for it. | ||
We have tested the MLX version with MacBook Pro M3. At the moment, we do not support quantization | ||
for the PyTorch version, so you will need a GPU with a significant amount of memory (24GB). | ||
|
||
For using the rust backend, you will need a recent version of the [Rust toolchain](https://rustup.rs/). | ||
To compile GPU support, you will also need the [CUDA](https://developer.nvidia.com/cuda-toolkit) properly installed for your GPU, in particular with `nvcc`. | ||
|
||
## Development | ||
|
||
If you wish to install from a clone of this repository, maybe to further develop Moshi, you can do the following: | ||
``` | ||
```bash | ||
# From the root of the clone of the repo | ||
pip install -e 'moshi[dev]' | ||
pip install -e 'moshi_mlx[dev]' | ||
pre-commit install | ||
``` | ||
|
||
If you wish to build locally `rustymimi` (assuming you have Rust properly installed): | ||
```bash | ||
pip install maturin | ||
maturin dev -r -m rust/mimi-pyo3/Cargo.toml | ||
``` | ||
|
||
## Python (PyTorch) | ||
|
||
The python api can be found in the `moshi` directory. It provides a streaming | ||
The Pytorch based API can be found in the `moshi` directory. It provides a streaming | ||
version of the audio tokenizer (mimi) and the lm model (moshi). | ||
|
||
In order to run in interactive mode, you need to start a server which will | ||
run the model, you can then use either the web UI or a command line client. | ||
|
||
Start the server with: | ||
```bash | ||
python -m moshi.server [--gradio_tunnel] | ||
python -m moshi.server [--gradio-tunnel] [--hf-repo kmhf/moshika-pytorch-bf16] | ||
``` | ||
|
||
And then access the web UI on [localhost:8998](http://localhost:8998). If your GPU is on a distant machine | ||
with no direct access, `--gradio_tunnel` will create a tunnel with a URL accessible from anywhere. | ||
with no direct access, `--gradio-tunnel` will create a tunnel with a URL accessible from anywhere. | ||
Keep in mind that this tunnel goes through the US and can add significant latency (up to 500ms from Europe). | ||
You can use `--gradio-tunnel-token` to set a fixed secret and reuse the same address over time. | ||
Alternatively, you might want to use SSH to redirect your connection. | ||
|
||
You can use `--hf-repo` to select a different pretrained model, by setting the proper Hugging Face repository. | ||
|
||
Accessing a server that is not localhost via http may cause issues around using | ||
the microphone in the web UI (in some browsers this is only allowed using | ||
https). | ||
|
@@ -110,12 +142,19 @@ python -m moshi.client [--url URL_TO_GRADIO] | |
However note, that unlike the web browser, this client is bare bone. It doesn't do any echo cancellation, | ||
nor does it try to compensate for a growing lag by skipping frames. | ||
|
||
For more information, in particular on how to use the API directly, please | ||
checkout [moshi/README.md](moshi/README.md). | ||
|
||
## Python (MLX) for local inference on macOS | ||
|
||
Once you have installed `moshi_mlx`, you can run | ||
```bash | ||
python -m moshi_mlx.local -q 4 # weights quantized to 4 bits | ||
python -m moshi_mlx.local -q 8 # weights quantized to 8 bits | ||
# And using a different pretrained model: | ||
python -m moshi_mlx.local -q 4 --hf-repo kmhf/moshika-mlx-q4 | ||
python -m moshi_mlx.local -q 8 --hf-repo kmhf/moshika-mlx-q8 | ||
# be careful to always match the `-q` and `--hf-repo` flag. | ||
``` | ||
|
||
This uses a command line interface, which is bare bone. It doesn't do any echo cancellation, | ||
|
@@ -136,7 +175,8 @@ cargo run --features cuda --bin moshi-backend -r -- --config moshi-backend/confi | |
When using macOS, you can replace `--features cuda` with `--features metal`. | ||
|
||
Alternatively you can use `config-q8.json` rather than `config.json` to use the | ||
quantified q8 model. | ||
quantified q8 model. You can select a different pretrained model, e.g. Moshika, | ||
by changing the `"hf_repo"` key in either file. | ||
|
||
Once the server has printed 'standalone worker listening', you can use the web | ||
UI. By default the rust version uses https so it will be at | ||
|
@@ -163,7 +203,7 @@ cargo run --bin moshi-cli -r -- tui --host localhost | |
### Python with PyTorch | ||
|
||
```bash | ||
PYTHONPATH=moshi python -m moshi.client | ||
python -m moshi.client | ||
``` | ||
|
||
### WebUI | ||
|
@@ -192,7 +232,8 @@ If you use either Mimi or Moshi, please cite the following paper, | |
``` | ||
@article{defossez2024moshi, | ||
title={Moshi: a speech-text foundation model for real-time dialogue}, | ||
author={Alexandre Défossez and Laurent Mazaré and Manu Orsini and Amélie Royer and Patrick Pérez and Hervé Jégou and Edouard Grave and Neil Zeghidour}, | ||
author={Alexandre Défossez and Laurent Mazaré and Manu Orsini and Amélie Royer and | ||
Patrick Pérez and Hervé Jégou and Edouard Grave and Neil Zeghidour}, | ||
journal={arXiv:TBC}, | ||
year={2024}, | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.