Skip to content

Latest commit

 

History

History
111 lines (86 loc) · 5.24 KB

README.md

File metadata and controls

111 lines (86 loc) · 5.24 KB

alts

( 🎙️ listens | 💭 thinks | 🔊 speaks )


💬 about

100% free, local and offline assistant with speech recognition and talk-back functionalities.

🤖 default usage

ALTS runs in the background and waits for you to press cmd+esc (or win+esc).

  • 🎙️ While holding the hotkey, your voice will be recorded (saves in the project root).
  • 💭 On release, the recording stops and a transcript is sent to the LLM (the recording is deleted).
  • 🔊 The LLM responses then get synthesized and played back to you (also shown as desktop notifications).

You can modify the hotkey combination and other settings in your config.yaml.

ALL processes are local and NONE of your recordings or queries leave your environment; the recordings are deleted as soon as they are used; it's ALL PRIVATE by default

⚙️ pre-requisites

  • python

    (tested on) version >=3.11 on macOS and version >=3.8 on windows

  • llm

    By default, the project is configured to work with Ollama, running the stablelm2 model (a very tiny and quick model). This setup makes the whole system completely free to run locally and great for low resource machines.

    However, we use LiteLLM in order to be provider agnostic, so you have full freedom to pick and choose your own combinations. Take a look at the supported Models/Providers for more details on LLM configuration.

    See .env.template and config-template.yaml for customizing your setup

  • stt

    We use openAI's whisper to transcribe your voice queries. It's a general-purpose speech recognition model.

    You will need to have ffmepg installed in your environment, you can download it from the official site.

    Make sure to check out their setup docs, for any other requirement.

    if you stumble into errors, one reason could be the model not downloading automatically. If that's the case you can run a whisper example transcription in your terminal (see examples) or manually download it and place the model-file in the correct folder

  • tts

    We use coqui-TTS for ALTS to talk-back to you. It's a library for advanced Text-to-Speech generation.

    You will need to install eSpeak-ng in your environment:

    • macOS – brew install espeak
    • linux – sudo apt-get install espeak -y
    • windows – download the executable from their repo

      on windows you'll also need Desktop development with C++ and .NET desktop build tools. Download the Microsoft C++ Build Tools and install these dependencies.

    Make sure to check out their setup docs, for any other requirement.

    if you don't have the configured model already downloaded it should download automatically during startup, however if you encounter any problems, the default model can be pre-downloaded by running the following:

    tts --text "this is a setup test" --out_path test_output.wav --model_name tts_models/en/vctk/vits --speaker_idx p364
    

    The default model has several "speakers" to choose from; running the following command will serve a demo site where you can test the different voices available:

    tts-server --model_name tts_models/en/vctk/vits
    

✅ get it running

clone the repo

git clone https://github.com/alxpez/alts.git

go to the main folder

cd alts/

install the project dependencies

pip install -r requirements.txt

see the pre-requisites section, to make sure your machine is ready to start the ALTS

duplicate and rename the needed config files

cp config-template.yaml config.yaml
cp .env.template .env

modify the default configuration to your needs

start up ALTS

sudo python alts.py

the keyboard package requires to be run as admin (in macOS and Linux), it's not the case on Windows