Skip to content

bentoml/BentoMoshi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-host Moshi with BentoML

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. This is a BentoML example project, showing you how to serve and deploy Moshi with BentoML. Specifically, it creates a real-time voice chat application by implementing a WebSocket endpoint for bi-directional audio streaming.

Here is the workflow after you start the server:

  1. You speak into your microphone. The client records the audio and sends it to the server in real-time via a WebSocket connection.
  2. The server uses the Mimi model to process the audio and the Moshi language model to generate both text and audio responses.
  3. The server sends the generated text and audio back to the client.
  4. The client plays the audio through your speakers and displays the text in the terminal.

Check out the full list of example projects to explore more BentoML use cases.

Prerequisites

If you want to test the Service locally, we recommend you use an Nvidia GPU with at least 24Gb VRAM.

Instructions

  1. Install uv.

    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Clone the project directory.

    git clone https://github.com/bentoml/BentoMoshi.git && cd BentoMoshi
  3. Try local serving:

    # option 1: bentoml serve [RECOMMENDED]
    uvx --with-editable . bentoml serve . --debug
    
    # option 2: script
    uvx --from . server
  4. The server will be running at http://localhost:3000. To connect to the WebSocket endpoint, use the following:

    URL=http://localhost:3000 uvx --from . client

Deploy to BentoCloud

You can deploy this project to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.

Make sure you have logged in to BentoCloud.

bentoml cloud login

Deploy it to BentoCloud.

uvx --with-editable . bentoml deploy .

After deployment, specify the URL on BentoCloud and use the client:

# option 1: uvx [RECOMMENDED]
URL=<bentocloud-endpoint> uv run --with-editable . bentomoshi/client.py

# option 2: using python
URL=<bentocloud-endpoint> python bentomoshi/client.py

Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages