From 46192490c9ab03e0b56afeafff76fecc0a262dc4 Mon Sep 17 00:00:00 2001 From: sallyom <somalley@redhat.com> Date: Mon, 18 Mar 2024 01:48:18 -0400 Subject: [PATCH] add whisper quadlet & update docs --- audio-to-text/README.md | 109 ++++++++++++++++++ .../client/Containerfile | 0 .../client/requirements.txt | 0 .../client/whisper_client.py | 0 audio-to-text/quadlet/README.md | 30 +++++ audio-to-text/quadlet/audio-text.image | 7 ++ audio-to-text/quadlet/audio-text.kube.example | 16 +++ audio-to-text/quadlet/audio-text.yaml | 45 ++++++++ .../whispercpp}/Containerfile | 0 model_servers/whispercpp/README.md | 46 ++++++++ model_servers/whispercpp/run.sh | 4 + models/Containerfile | 1 + playground/README.md | 2 +- whisper-playground/README.md | 77 ------------- whisper-playground/run.sh | 3 - 15 files changed, 259 insertions(+), 81 deletions(-) create mode 100644 audio-to-text/README.md rename {whisper-playground => audio-to-text}/client/Containerfile (100%) rename {whisper-playground => audio-to-text}/client/requirements.txt (100%) rename {whisper-playground => audio-to-text}/client/whisper_client.py (100%) create mode 100644 audio-to-text/quadlet/README.md create mode 100644 audio-to-text/quadlet/audio-text.image create mode 100644 audio-to-text/quadlet/audio-text.kube.example create mode 100644 audio-to-text/quadlet/audio-text.yaml rename {whisper-playground => model_servers/whispercpp}/Containerfile (100%) create mode 100644 model_servers/whispercpp/README.md create mode 100644 model_servers/whispercpp/run.sh delete mode 100644 whisper-playground/README.md delete mode 100644 whisper-playground/run.sh diff --git a/audio-to-text/README.md b/audio-to-text/README.md new file mode 100644 index 000000000..c1647f750 --- /dev/null +++ b/audio-to-text/README.md @@ -0,0 +1,109 @@ +# Audio to Text Application + + This sample application is a simple recipe to transcribe an audio file. + This provides a simple recipe to help developers start building out their own custom LLM enabled + audio-to-text applications. It consists of two main components; the Model Service and the AI Application. + + There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git) + and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, + [`model_servers/whisper/Containerfile`](/model_servers/whisper/Containerfile). + + Our AI Application will connect to our Model Service via it's OpenAI compatible API. + +![](/assets/whisper.png) + +# Build the Application + +In order to build this application we will need a model, a Model Service and an AI Application. + +* [Download a model](#download-a-model) +* [Build the Model Service](#build-the-model-service) +* [Deploy the Model Service](#deploy-the-model-service) +* [Build the AI Application](#build-the-ai-application) +* [Deploy the AI Application](#deploy-the-ai-application) +* [Interact with the AI Application](#interact-with-the-ai-application) + * [Input audio files](#input-audio-files) + +### Download a model + +If you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp). +This is a well performant mid-sized model with an apache-2.0 license. +It's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co) +here: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`. + +The recommended model can be downloaded using the code snippet below: + +```bash +cd models +wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin +cd ../ +``` + +_A full list of supported open models is forthcoming._ + + +### Build the Model Service + +The Model Service can be built from the root directory with the following code snippet: + +```bash +cd model_servers/whispercpp +podman build -t whispercppserver . +``` + +### Deploy the Model Service + +The local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following podman command: +``` +podman run --rm -it \ + -p 8001:8001 \ + -v Local/path/to/locallm/models:/locallm/models \ + -e MODEL_PATH=models/<model-filename> \ + -e HOST=0.0.0.0 \ + -e PORT=8001 \ + whispercppserver +``` + +### Build the AI Application + +Now that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application +image from the `audio-to-text/` directory. + +```bash +cd audio-to-text +podman build -t audio-to-text . -f builds/Containerfile +``` +### Deploy the AI Application + +Make sure the Model Service is up and running before starting this container image. +When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`. +This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. +The following podman command can be used to run your AI Application: + +```bash +podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text +``` + +### Interact with the AI Application + +Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`. +From here, you can upload audio files from your local machine and translate the audio files as shown below. + +Everything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501). +By using this recipe and getting this starting point established, +users should now have an easier time customizing and building their own LLM enabled chatbot applications. + +#### Input audio files + +Whisper.cpp requires as an input 16-bit WAV audio files. +To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this: + +```bash +ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav> +``` + +<p align="center"> +<img src="../assets/whisper.png" width="70%"> +</p> + + diff --git a/whisper-playground/client/Containerfile b/audio-to-text/client/Containerfile similarity index 100% rename from whisper-playground/client/Containerfile rename to audio-to-text/client/Containerfile diff --git a/whisper-playground/client/requirements.txt b/audio-to-text/client/requirements.txt similarity index 100% rename from whisper-playground/client/requirements.txt rename to audio-to-text/client/requirements.txt diff --git a/whisper-playground/client/whisper_client.py b/audio-to-text/client/whisper_client.py similarity index 100% rename from whisper-playground/client/whisper_client.py rename to audio-to-text/client/whisper_client.py diff --git a/audio-to-text/quadlet/README.md b/audio-to-text/quadlet/README.md new file mode 100644 index 000000000..2bebaef42 --- /dev/null +++ b/audio-to-text/quadlet/README.md @@ -0,0 +1,30 @@ +### Run audio-text locally as a podman pod + +There are pre-built images and a pod definition to run this audio-to-text example application. +This sample converts an audio waveform (.wav) file to text. + +To run locally, + +```bash +podman kube play ./quadlet/audio-to-text.yaml +``` +To monitor locally, + +```bash +podman pod list +podman ps +podman logs <name of container from the above> +``` + +The application should be acessible at `http://localhost:8501`. It will take a few minutes for the model to load. + +### Run audio-text as a systemd service + +```bash +cp audio-text.yaml /etc/containers/systemd/audio-text.yaml +cp audio-text.kube.example /etc/containers/audio-text.kube +cp audio-text.image /etc/containers/audio-text.image +/usr/libexec/podman/quadlet --dryrun (optional) +systemctl daemon-reload +systemctl start audio-text +``` diff --git a/audio-to-text/quadlet/audio-text.image b/audio-to-text/quadlet/audio-text.image new file mode 100644 index 000000000..19d5fcc30 --- /dev/null +++ b/audio-to-text/quadlet/audio-text.image @@ -0,0 +1,7 @@ +[Install] +WantedBy=audio-text.service + +[Image] +Image=quay.io/redhat-et/locallm-whisper-ggml-small:latest +Image=quay.io/redhat-et/locallm-whisper-service:latest +Image=quay.io/redhat-et/locallm-audio-to-text:latest diff --git a/audio-to-text/quadlet/audio-text.kube.example b/audio-to-text/quadlet/audio-text.kube.example new file mode 100644 index 000000000..391408f31 --- /dev/null +++ b/audio-to-text/quadlet/audio-text.kube.example @@ -0,0 +1,16 @@ +[Unit] +Description=Python script to run against downloaded LLM +Documentation=man:podman-generate-systemd(1) +Wants=network-online.target +After=network-online.target +RequiresMountsFor=%t/containers + +[Kube] +# Point to the yaml file in the same directory +Yaml=audio-text.yaml + +[Service] +Restart=always + +[Install] +WantedBy=default.target diff --git a/audio-to-text/quadlet/audio-text.yaml b/audio-to-text/quadlet/audio-text.yaml new file mode 100644 index 000000000..2307c4788 --- /dev/null +++ b/audio-to-text/quadlet/audio-text.yaml @@ -0,0 +1,45 @@ +apiVersion: v1 +kind: Pod +metadata: + labels: + app: audio-to-text + name: audio-to-text +spec: + initContainers: + - name: model-file + image: quay.io/redhat-et/locallm-whisper-ggml-small:latest + command: ['/usr/bin/install', "/model/ggml-small.bin", "/shared/"] + volumeMounts: + - name: model-file + mountPath: /shared + containers: + - env: + - name: MODEL_SERVICE_ENDPOINT + value: http://0.0.0.0:8001/inference + image: quay.io/redhat-et/locallm-audio-to-text:latest + name: audio-to-text + ports: + - containerPort: 8501 + hostPort: 8501 + securityContext: + runAsNonRoot: true + - env: + - name: HOST + value: 0.0.0.0 + - name: PORT + value: 8001 + - name: MODEL_PATH + value: /model/ggml-small.bin + image: quay.io/redhat-et/locallm-whisper-service:latest + name: whisper-model-service + ports: + - containerPort: 8001 + hostPort: 8001 + securityContext: + runAsNonRoot: true + volumeMounts: + - name: model-file + mountPath: /model + volumes: + - name: model-file + emptyDir: {} diff --git a/whisper-playground/Containerfile b/model_servers/whispercpp/Containerfile similarity index 100% rename from whisper-playground/Containerfile rename to model_servers/whispercpp/Containerfile diff --git a/model_servers/whispercpp/README.md b/model_servers/whispercpp/README.md new file mode 100644 index 000000000..f38c68395 --- /dev/null +++ b/model_servers/whispercpp/README.md @@ -0,0 +1,46 @@ +## Whisper + +Whisper models are useful for converting audio files to text. The sample application [audio-to-text](../audio-to-text/README.md) +describes how to run an inference application. This document describes how to build a service for a Whisper model. + +### Build model service + +To build a Whisper model service container image from this directory, + +```bash +podman build -t whisper:image . +``` + +### Download Whisper model + +You can to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found +[here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB. + +- **small** + - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin) + +```bash +cd ../models +wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL> +cd ../ +``` + +### Deploy Model Service + +Deploy the LLM and volume mount the model of choice. +Here, we are mounting the `ggml-small.bin` model as downloaded from above. + +```bash +# Note: the :Z may need to be omitted from the model volume mount if not running on Linux + +podman run --rm -it \ + -p 8001:8001 \ + -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \ + -e HOST=0.0.0.0 \ + -e MODEL_PATH=/models/ggml-small.bin \ + -e PORT=8001 \ + whisper:image +``` + +By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with. +The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image. diff --git a/model_servers/whispercpp/run.sh b/model_servers/whispercpp/run.sh new file mode 100644 index 000000000..7e640b762 --- /dev/null +++ b/model_servers/whispercpp/run.sh @@ -0,0 +1,4 @@ +#! bin/bash + +./server -tr --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001} + diff --git a/models/Containerfile b/models/Containerfile index 981c85f44..e359bf7cb 100644 --- a/models/Containerfile +++ b/models/Containerfile @@ -1,6 +1,7 @@ #https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf #https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf #https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf +#https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin # podman build --build-arg MODEL_URL=https://... -t quay.io/yourimage . FROM registry.access.redhat.com/ubi9/ubi-micro:9.3-13 ARG MODEL_URL diff --git a/playground/README.md b/playground/README.md index 556c98a6e..cdc8caaed 100644 --- a/playground/README.md +++ b/playground/README.md @@ -69,4 +69,4 @@ podman run --rm -it -d \ -v Local/path/to/locallm/models:/locallm/models:ro,Z \ -e CONFIG_PATH=models/<config-filename> \ playground:image -``` \ No newline at end of file +``` diff --git a/whisper-playground/README.md b/whisper-playground/README.md deleted file mode 100644 index 519b6d6b1..000000000 --- a/whisper-playground/README.md +++ /dev/null @@ -1,77 +0,0 @@ -### Pre-Requisites - -If you are using an Apple MacBook M-series laptop, you will probably need to do the following configurations: - -* `brew tap cfergeau/crc` -* `brew install vfkit` -* `export CONTAINERS_MACHINE_PROVIDER=applehv` -* Edit your `/Users/<your username>/.config/containers/containers.conf` file to include: -```bash -[machine] -provider = "applehv" -``` -* Ensure you have enough resources on your Podman machine. Recommended to have atleast `CPU: 8, Memory: 10 GB` - -### Build Model Service - -From this directory, - -```bash -podman build -t whisper:image . -``` - -### Download Model - -We need to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found [here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB. - -- **small** - - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin) - -```bash -cd ../models -wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL> -cd ../ -``` - -### Download audio files - -Whisper.cpp requires as an input 16-bit WAV audio files. -By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with. -To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this: - -```bash -ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav> -``` - -The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image. - -### Deploy Model Service - -Deploy the LLM and volume mount the model of choice. -Here, we are mounting the `ggml-small.bin` model as downloaded from above. - -```bash -podman run --rm -it \ - -p 8001:8001 \ - -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \ - -e HOST=0.0.0.0 \ - -e PORT=8001 \ - whisper:image -``` - -### Build and run the client application - -We will use Streamlit to create a front end application with which you can interact with the Whisper model through a simple UI. - -```bash -podman build -t whisper_client whisper-playground/client -``` - -```bash -podman run -p 8501:8501 -e MODEL_ENDPOINT=http://0.0.0.0:8000/inference whisper_client -``` -Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`. From here, you can upload audio files from your local machine and translate the audio files as shown below. - -<p align="center"> -<img src="../assets/whisper.png" width="70%"> -</p> diff --git a/whisper-playground/run.sh b/whisper-playground/run.sh deleted file mode 100644 index 7fcd0a913..000000000 --- a/whisper-playground/run.sh +++ /dev/null @@ -1,3 +0,0 @@ -#! bin/bash - -./server -tr -m /models/ggml-small.bin --host ${HOST:=0.0.0.0} --port ${PORT:=8001} \ No newline at end of file