add whisper quadlet & update docs

containers · Mar 22, 2024 · 4619249 · 4619249
1 parent 03536c2
commit 4619249
Show file tree

Hide file tree

Showing 15 changed files with 259 additions and 81 deletions.
diff --git a/audio-to-text/README.md b/audio-to-text/README.md
@@ -0,0 +1,109 @@
+# Audio to Text Application
+
+  This sample application is a simple recipe to transcribe an audio file.
+  This provides a simple recipe to help developers start building out their own custom LLM enabled
+  audio-to-text applications. It consists of two main components; the Model Service and the AI Application.
+
+  There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git)
+  and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo,
+  [`model_servers/whisper/Containerfile`](/model_servers/whisper/Containerfile).
+
+  Our AI Application will connect to our Model Service via it's OpenAI compatible API.
+
+![](/assets/whisper.png) 
+
+# Build the Application
+
+In order to build this application we will need a model, a Model Service and an AI Application.  
+
+* [Download a model](#download-a-model)
+* [Build the Model Service](#build-the-model-service)
+* [Deploy the Model Service](#deploy-the-model-service)
+* [Build the AI Application](#build-the-ai-application)
+* [Deploy the AI Application](#deploy-the-ai-application)
+* [Interact with the AI Application](#interact-with-the-ai-application)
+    * [Input audio files](#input-audio-files)
+
+### Download a model
+
+If you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp).
+This is a well performant mid-sized model with an apache-2.0 license.
+It's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co)
+here: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`. 
+
+The recommended model can be downloaded using the code snippet below:
+
+```bash
+cd models
+wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin 
+cd ../
+```
+
+_A full list of supported open models is forthcoming._  
+
+
+### Build the Model Service
+
+The Model Service can be built from the root directory with the following code snippet:
+
+```bash
+cd model_servers/whispercpp
+podman build -t whispercppserver .
+```
+
+### Deploy the Model Service
+
+The local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following podman command:  
+```
+podman run --rm -it \
+        -p 8001:8001 \
+        -v Local/path/to/locallm/models:/locallm/models \
+        -e MODEL_PATH=models/<model-filename> \
+        -e HOST=0.0.0.0 \
+        -e PORT=8001 \
+        whispercppserver
+```
+
+### Build the AI Application
+
+Now that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application
+image from the `audio-to-text/` directory.
+
+```bash
+cd audio-to-text
+podman build -t audio-to-text . -f builds/Containerfile   
+```
+### Deploy the AI Application
+
+Make sure the Model Service is up and running before starting this container image.
+When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`.
+This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API.
+The following podman command can be used to run your AI Application:  
+
+```bash
+podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text 
+```
+
+### Interact with the AI Application
+
+Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`.
+From here, you can upload audio files from your local machine and translate the audio files as shown below.
+
+Everything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501).
+By using this recipe and getting this starting point established,
+users should now have an easier time customizing and building their own LLM enabled chatbot applications.   
+
+#### Input audio files
+
+Whisper.cpp requires as an input 16-bit WAV audio files.
+To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this:
+
+```bash
+ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>
+```
+
+<p align="center">
+<img src="../assets/whisper.png" width="70%">
+</p>
+
+
diff --git a/whisper-playground/client/Containerfile → audio-to-text/client/Containerfile b/whisper-playground/client/Containerfile → audio-to-text/client/Containerfile
diff --git a/whisper-playground/client/requirements.txt → audio-to-text/client/requirements.txt b/whisper-playground/client/requirements.txt → audio-to-text/client/requirements.txt
diff --git a/whisper-playground/client/whisper_client.py → audio-to-text/client/whisper_client.py b/whisper-playground/client/whisper_client.py → audio-to-text/client/whisper_client.py
diff --git a/audio-to-text/quadlet/README.md b/audio-to-text/quadlet/README.md
@@ -0,0 +1,30 @@
+### Run audio-text locally as a podman pod
+
+There are pre-built images and a pod definition to run this audio-to-text example application.
+This sample converts an audio waveform (.wav) file to text.
+
+To run locally, 
+
+```bash
+podman kube play ./quadlet/audio-to-text.yaml
+```
+To monitor locally,
+
+```bash
+podman pod list
+podman ps 
+podman logs <name of container from the above>
+```
+
+The application should be acessible at `http://localhost:8501`. It will take a few minutes for the model to load. 
+
+### Run audio-text as a systemd service
+
+```bash
+cp audio-text.yaml /etc/containers/systemd/audio-text.yaml
+cp audio-text.kube.example /etc/containers/audio-text.kube
+cp audio-text.image /etc/containers/audio-text.image
+/usr/libexec/podman/quadlet --dryrun (optional)
+systemctl daemon-reload
+systemctl start audio-text
+```
diff --git a/audio-to-text/quadlet/audio-text.image b/audio-to-text/quadlet/audio-text.image
@@ -0,0 +1,7 @@
+[Install]
+WantedBy=audio-text.service
+
+[Image]
+Image=quay.io/redhat-et/locallm-whisper-ggml-small:latest
+Image=quay.io/redhat-et/locallm-whisper-service:latest
+Image=quay.io/redhat-et/locallm-audio-to-text:latest
diff --git a/audio-to-text/quadlet/audio-text.kube.example b/audio-to-text/quadlet/audio-text.kube.example
@@ -0,0 +1,16 @@
+[Unit]
+Description=Python script to run against downloaded LLM
+Documentation=man:podman-generate-systemd(1)
+Wants=network-online.target
+After=network-online.target
+RequiresMountsFor=%t/containers
+
+[Kube]
+# Point to the yaml file in the same directory
+Yaml=audio-text.yaml
+
+[Service]
+Restart=always
+
+[Install]
+WantedBy=default.target
diff --git a/audio-to-text/quadlet/audio-text.yaml b/audio-to-text/quadlet/audio-text.yaml
@@ -0,0 +1,45 @@
+apiVersion: v1
+kind: Pod
+metadata:
+  labels:
+    app: audio-to-text
+  name: audio-to-text
+spec:
+  initContainers:
+  - name: model-file
+    image: quay.io/redhat-et/locallm-whisper-ggml-small:latest
+    command: ['/usr/bin/install', "/model/ggml-small.bin", "/shared/"]
+    volumeMounts:
+    - name: model-file
+      mountPath: /shared
+  containers:
+  - env:
+    - name: MODEL_SERVICE_ENDPOINT
+      value: http://0.0.0.0:8001/inference
+    image: quay.io/redhat-et/locallm-audio-to-text:latest
+    name: audio-to-text
+    ports:
+    - containerPort: 8501
+      hostPort: 8501
+    securityContext:
+      runAsNonRoot: true
+  - env:
+    - name: HOST
+      value: 0.0.0.0
+    - name: PORT
+      value: 8001
+    - name: MODEL_PATH
+      value: /model/ggml-small.bin
+    image: quay.io/redhat-et/locallm-whisper-service:latest
+    name: whisper-model-service
+    ports:
+    - containerPort: 8001
+      hostPort: 8001
+    securityContext:
+      runAsNonRoot: true
+    volumeMounts:
+    - name: model-file
+      mountPath: /model
+  volumes:
+  - name: model-file
+    emptyDir: {}
diff --git a/whisper-playground/Containerfile → model_servers/whispercpp/Containerfile b/whisper-playground/Containerfile → model_servers/whispercpp/Containerfile
diff --git a/model_servers/whispercpp/README.md b/model_servers/whispercpp/README.md
@@ -0,0 +1,46 @@
+## Whisper
+
+Whisper models are useful for converting audio files to text. The sample application [audio-to-text](../audio-to-text/README.md)
+describes how to run an inference application. This document describes how to build a service for a Whisper model.
+
+### Build model service
+
+To build a Whisper model service container image from this directory,
+
+```bash
+podman build -t whisper:image .
+```
+
+### Download Whisper model
+
+You can to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found
+[here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB.
+
+- **small**
+    - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin)
+
+```bash
+cd ../models
+wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL>
+cd ../
+```
+
+### Deploy Model Service
+
+Deploy the LLM and volume mount the model of choice.
+Here, we are mounting the `ggml-small.bin` model as downloaded from above.
+
+```bash
+# Note: the :Z may need to be omitted from the model volume mount if not running on Linux
+
+podman run --rm -it \
+        -p 8001:8001 \
+        -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \
+        -e HOST=0.0.0.0 \
+        -e MODEL_PATH=/models/ggml-small.bin \
+        -e PORT=8001 \
+        whisper:image
+```
+
+By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with.
+The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image.
diff --git a/model_servers/whispercpp/run.sh b/model_servers/whispercpp/run.sh
@@ -0,0 +1,4 @@
+#! bin/bash
+
+./server -tr --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001}
+
diff --git a/models/Containerfile b/models/Containerfile
@@ -1,6 +1,7 @@
 #https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf
 #https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf
 #https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf
+#https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
 # podman build --build-arg MODEL_URL=https://... -t quay.io/yourimage .
 FROM registry.access.redhat.com/ubi9/ubi-micro:9.3-13
 ARG MODEL_URL

diff --git a/playground/README.md b/playground/README.md
@@ -69,4 +69,4 @@ podman run --rm -it -d \
         -v Local/path/to/locallm/models:/locallm/models:ro,Z \
         -e CONFIG_PATH=models/<config-filename> \
         playground:image
-```
+```
diff --git a/whisper-playground/README.md b/whisper-playground/README.md
diff --git a/whisper-playground/run.sh b/whisper-playground/run.sh
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		#! bin/bash

		./server -tr --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001}