Skip to content

anselm/webllm-conversation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webllm Conversation

Conversational interface wrapper around an llm service. This project is a user interface harness around an LLM.

The goal is to incrementally build a stack of services from basic conversations to embodied 3d puppets. There's an emphasis on clearly signaling when the llm is done. Also the hope is to do this with 'no strings' - entirely client side.

Note you will need a powerful laptop/rig to run this demo with any reasonable performance (circa 2024).

See example at https://orbitalfoundation.github.io/webllm-conversation/

Running

Build and run like so:

gh repo clone orbitalfoundation/webllm-conversation
cd webllm-conversation
npm install
npx http-server -c-1 &
npm run build

Notes here:

A bundler is used because of the third party component dependencies. This could be improved however. Also the bundler doesn't deal with workers very well so those are built separately.

There are many choices for web-servers, such as 'npm install -g tiny-server' or 'npm install -g live-server' or in this case http-server.

Changelog

Version 0: Text based conversation and foundations

In the first version of a conversational interface to llms we want to:

  1. be able to handle user input using a text chat window with limited history
  2. pass user prompts to an llm and get back a response
  3. break down the response into "breath" segments of about a single breath each
  4. have responses be visible to the user
  5. allow interruptions and stopping of the llm at any time
  6. display state of when the llm is thinking or speaking versus when it is ready for input
  7. disambiguate bot 'actions' (smiling, gesturing, pointing) from speaking

We have a few tool choices:

  1. Bundling ... ESBuild will do for now although it doesn't have introspection inside dynamically fetched imports so we have to go and specify imports (see 'npm run build and esbuild-make.mjs). The main reason to use esbuild instead of Vite is that Vite has opaque configuration options such as for github pages. See https://esbuild.github.io/ .

  2. Let's also use web-llm as our client side llm. See https://github.com/mlc-ai/web-llm/ - effectively this project is identical to projects they provide but we have a bit more emphasis on status reporting of when the llm is busy or not.

  3. Bring in Aider ... Aider has become a very helpful tool for scaffolding apps quickly and conversationally. All of the HTML for this app was built by Aider. See https://aider.chat/

  4. Stage on github and have a static github page. For now we will have the /index.html point to the dist folder generated by esbuild.

Version 1: Voice input

Voice input added. Using the built-in voice recognition (which is quite good for english at least). Visually signal when the bot is unable to allow you to speak - one defect with built in voice recognition is a lack of noise cancellation. Webrtc and other services do have loopback detection and other more exotic schemes are possible - but this is not supported yet.

Version 2: Voice output

TBD.

Using breath segments - perform background processing on each breath segment to lower the time delay until first response. Uses a client side version of whisper running as a background worker.

Version 3: Rigged and animated puppet

TBD

Issues / Bugs

I do notice a few issues:

  • it is reporting ready too soon
  • use service workers
  • input box was not foregrounded?
  • the packaging, esbuild and so on could be improved - for example webllm is loaded twice

About

Conversational Puppets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published