Here is the answer to a number of frequently asked questions.
We will release some training / fine-tuning code, but we do not have any timeline yet. Please be patient.
We will not release the pre-training dataset.
At the moment no. Moshi only speaks English. It has some basic support for translating some sentences or words to other languages, but you shouldn't expect to use it fully in any other language than English.
This would require fine tuning, which is not currently supported.
Sadly we do not think this is currently possible. Quantizing beyond 4 bits lead to dramatic decrease in quality, see PR #58. While we keep those limitations in mind for future versions, there is no immediate solution.
At the moment no, we might look into adding this feature when we get the time. At the moment it is however possible to use the Rust backend, which should run in int8 with CUDA.
This is expected on the MLX and Rust implementation. We only use a fixed buffer, and we do not discard past entries. The PyTorch version should work for unlimited times, although this is mostly untested and we expect the quality to degrade after a bit (we have no attention sink or other mechanism to improve the streaming beyond the finite context used at training).
For diagnosis, look at your browser console if there is any error being reported.
If you see issues that look like the following:
Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'addModule')
this is likely caused by the http server being remote and audio being disabled for http in such a case.
To get around this, tunnel the 8998 port from the remote server to the localhost via ssh and access localhost:8998 via http normally after that.