Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update candle. #96

Merged
merged 1 commit into from
Sep 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[Moshi][moshi] is a speech-text foundation model and **full-duplex** spoken dialogue framework.
It uses [Mimi][moshi], a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation
with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size),
yet performs better than existing, non-streaming, codec like
yet performs better than existing, non-streaming, codecs like
[SpeechTokenizer](https://github.com/ZhangXInFD/SpeechTokenizer) (50 Hz, 4kbps), or [SemantiCodec](https://github.com/haoheliu/SemantiCodec-inference) (50 Hz, 1.3kbps).

Moshi models **two streams of audio**: one corresponds to Moshi, and the other one to the user.
Expand Down Expand Up @@ -38,7 +38,7 @@ subjective quality despite its low bitrate.

<p align="center">
<img src="./mimi.png" alt="Schema representing the structure of Mimi, our proposed neural codec. Mimi contains a Transformer
in both its encoder and decoded, and achieves a frame rate closer to that of text tokens. This allows us to reduce
in both its encoder and decoder, and achieves a frame rate closer to that of text tokens. This allows us to reduce
the number of auto-regressive steps taken by Moshi, thus reducing the latency of the model."
width="800px"></p>

Expand Down Expand Up @@ -91,7 +91,7 @@ pip install rustymimi # mimi, rust implementation with Python bindings from PyP
```

If you are not using Python 3.12, you might get an error when installing
`moshi_mlx` or `rustymimi` (which `moshi_mlx` depends on). Then,you will need to install the [Rust toolchain](https://rustup.rs/), or switch to Python 3.12.
`moshi_mlx` or `rustymimi` (which `moshi_mlx` depends on). Then, you will need to install the [Rust toolchain](https://rustup.rs/), or switch to Python 3.12.

While we hope that the present codebase will work on Windows, we do not provide official support for it.
We have tested the MLX version on a MacBook Pro M3. At the moment, we do not support quantization
Expand Down Expand Up @@ -129,7 +129,7 @@ A local client is also available, as
```bash
python -m moshi.client [--url URL_TO_GRADIO]
```
However note that, unlike the web browser, this client is barebone: It does not perform any echo cancellation,
However note that, unlike the web browser, this client is barebone: it does not perform any echo cancellation,
nor does it try to compensate for a growing lag by skipping frames.

For more information, in particular on how to use the API directly, please
Expand Down Expand Up @@ -179,7 +179,7 @@ site" or "Proceed to localhost (unsafe)".
## Clients

We recommend using the web UI as it provides additional echo cancellation that helps
the overall model quality. Note that most command will directly serve this UI
the overall model quality. Note that most commands will directly serve this UI
in the provided URL, and there is in general nothing more to do.

Alternatively, we provide command line interfaces
Expand Down
10 changes: 5 additions & 5 deletions rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ members = [
resolver = "2"

[workspace.package]
version = "0.2.1"
version = "0.2.2"
edition = "2021"
license = "MIT/Apache-2.0"
description = "moshi, a real-time voice AI"
Expand All @@ -18,10 +18,10 @@ categories = ["science"]


[workspace.dependencies]
candle = { version = "0.6.0", package = "candle-core" }
candle-nn = "0.6.0"
candle-transformers = "0.6.0"
candle-flash-attn = "0.6.0"
candle = { version = "0.7.0", package = "candle-core" }
candle-nn = "0.7.0"
candle-transformers = "0.7.0"
candle-flash-attn = "0.7.0"

[profile.release]
debug = true
Expand Down
Loading