Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
manukyutai authored Sep 18, 2024
1 parent 5dab209 commit 5e5f498
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
[SpeechTokenizer](https://github.com/ZhangXInFD/SpeechTokenizer) (50 Hz, 4 kbps), or [SemantiCodec](https://github.com/haoheliu/SemantiCodec-inference) (50 Hz, 1kbps).

Moshi models **two streams of audio**: one corresponds to Moshi, and one to the user.
During inference, the stream from the user is taken from the audio input,
At inference, the stream from the user is taken from the audio input,
and the one for Moshi is sampled from. Along that, Moshi predicts text tokens corresponding to its own speech, its **inner monologue**,
which greatly improves the quality of its generation. A small depth transformer models inter codebook dependencies for a given time step,
while a large, 7B parameter Transformer models the temporal dependencies. Moshi achieves a theoretical latency
Expand Down

0 comments on commit 5e5f498

Please sign in to comment.