Self-host Moshi with BentoML

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. This is a BentoML example project, showing you how to serve and deploy Moshi with BentoML. Specifically, it creates a real-time voice chat application by implementing a WebSocket endpoint for bi-directional audio streaming.

Here is the workflow after you start the server:

You speak into your microphone. The client records the audio and sends it to the server in real-time via a WebSocket connection.
The server uses the Mimi model to process the audio and the Moshi language model to generate both text and audio responses.
The server sends the generated text and audio back to the client.
The client plays the audio through your speakers and displays the text in the terminal.

Check out the full list of example projects to explore more BentoML use cases.

Prerequisites

If you want to test the Service locally, we recommend you use an Nvidia GPU with at least 24Gb VRAM.

Instructions

Install uv.

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone the project directory.

git clone https://github.com/bentoml/BentoMoshi.git && cd BentoMoshi

Try local serving:

# option 1: bentoml serve [RECOMMENDED]
uvx --with-editable . bentoml serve . --debug

# option 2: script
uvx --from . server

The server will be running at http://localhost:3000. To connect to the WebSocket endpoint, use the following:
```
URL=http://localhost:3000 uvx --from . client
```

Deploy to BentoCloud

You can deploy this project to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.

Make sure you have logged in to BentoCloud.

bentoml cloud login

Deploy it to BentoCloud.

uvx --with-editable . bentoml deploy .

After deployment, specify the URL on BentoCloud and use the client:

# option 1: uvx [RECOMMENDED]
URL=<bentocloud-endpoint> uv run --with-editable . bentomoshi/client.py

# option 2: using python
URL=<bentocloud-endpoint> python bentomoshi/client.py

Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bentomoshi		bentomoshi
.bentoignore		.bentoignore
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
female.wav		female.wav
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-host Moshi with BentoML

Prerequisites

Instructions

Deploy to BentoCloud

About

Releases

Packages

Contributors 2

Languages

License

bentoml/BentoMoshi

Folders and files

Latest commit

History

Repository files navigation

Self-host Moshi with BentoML

Prerequisites

Instructions

Deploy to BentoCloud

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages