From 46192490c9ab03e0b56afeafff76fecc0a262dc4 Mon Sep 17 00:00:00 2001
From: sallyom <somalley@redhat.com>
Date: Mon, 18 Mar 2024 01:48:18 -0400
Subject: [PATCH] add whisper quadlet & update docs

---
 audio-to-text/README.md                       | 109 ++++++++++++++++++
 .../client/Containerfile                      |   0
 .../client/requirements.txt                   |   0
 .../client/whisper_client.py                  |   0
 audio-to-text/quadlet/README.md               |  30 +++++
 audio-to-text/quadlet/audio-text.image        |   7 ++
 audio-to-text/quadlet/audio-text.kube.example |  16 +++
 audio-to-text/quadlet/audio-text.yaml         |  45 ++++++++
 .../whispercpp}/Containerfile                 |   0
 model_servers/whispercpp/README.md            |  46 ++++++++
 model_servers/whispercpp/run.sh               |   4 +
 models/Containerfile                          |   1 +
 playground/README.md                          |   2 +-
 whisper-playground/README.md                  |  77 -------------
 whisper-playground/run.sh                     |   3 -
 15 files changed, 259 insertions(+), 81 deletions(-)
 create mode 100644 audio-to-text/README.md
 rename {whisper-playground => audio-to-text}/client/Containerfile (100%)
 rename {whisper-playground => audio-to-text}/client/requirements.txt (100%)
 rename {whisper-playground => audio-to-text}/client/whisper_client.py (100%)
 create mode 100644 audio-to-text/quadlet/README.md
 create mode 100644 audio-to-text/quadlet/audio-text.image
 create mode 100644 audio-to-text/quadlet/audio-text.kube.example
 create mode 100644 audio-to-text/quadlet/audio-text.yaml
 rename {whisper-playground => model_servers/whispercpp}/Containerfile (100%)
 create mode 100644 model_servers/whispercpp/README.md
 create mode 100644 model_servers/whispercpp/run.sh
 delete mode 100644 whisper-playground/README.md
 delete mode 100644 whisper-playground/run.sh

diff --git a/audio-to-text/README.md b/audio-to-text/README.md
new file mode 100644
index 000000000..c1647f750
--- /dev/null
+++ b/audio-to-text/README.md
@@ -0,0 +1,109 @@
+# Audio to Text Application
+
+  This sample application is a simple recipe to transcribe an audio file.
+  This provides a simple recipe to help developers start building out their own custom LLM enabled
+  audio-to-text applications. It consists of two main components; the Model Service and the AI Application.
+
+  There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git)
+  and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo,
+  [`model_servers/whisper/Containerfile`](/model_servers/whisper/Containerfile).
+
+  Our AI Application will connect to our Model Service via it's OpenAI compatible API.
+
+![](/assets/whisper.png) 
+
+# Build the Application
+
+In order to build this application we will need a model, a Model Service and an AI Application.  
+
+* [Download a model](#download-a-model)
+* [Build the Model Service](#build-the-model-service)
+* [Deploy the Model Service](#deploy-the-model-service)
+* [Build the AI Application](#build-the-ai-application)
+* [Deploy the AI Application](#deploy-the-ai-application)
+* [Interact with the AI Application](#interact-with-the-ai-application)
+    * [Input audio files](#input-audio-files)
+
+### Download a model
+
+If you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp).
+This is a well performant mid-sized model with an apache-2.0 license.
+It's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co)
+here: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`. 
+
+The recommended model can be downloaded using the code snippet below:
+
+```bash
+cd models
+wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin 
+cd ../
+```
+
+_A full list of supported open models is forthcoming._  
+
+
+### Build the Model Service
+
+The Model Service can be built from the root directory with the following code snippet:
+
+```bash
+cd model_servers/whispercpp
+podman build -t whispercppserver .
+```
+
+### Deploy the Model Service
+
+The local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following podman command:  
+```
+podman run --rm -it \
+        -p 8001:8001 \
+        -v Local/path/to/locallm/models:/locallm/models \
+        -e MODEL_PATH=models/<model-filename> \
+        -e HOST=0.0.0.0 \
+        -e PORT=8001 \
+        whispercppserver
+```
+
+### Build the AI Application
+
+Now that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application
+image from the `audio-to-text/` directory.
+
+```bash
+cd audio-to-text
+podman build -t audio-to-text . -f builds/Containerfile   
+```
+### Deploy the AI Application
+
+Make sure the Model Service is up and running before starting this container image.
+When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`.
+This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API.
+The following podman command can be used to run your AI Application:  
+
+```bash
+podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text 
+```
+
+### Interact with the AI Application
+
+Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`.
+From here, you can upload audio files from your local machine and translate the audio files as shown below.
+
+Everything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501).
+By using this recipe and getting this starting point established,
+users should now have an easier time customizing and building their own LLM enabled chatbot applications.   
+
+#### Input audio files
+
+Whisper.cpp requires as an input 16-bit WAV audio files.
+To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this:
+
+```bash
+ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>
+```
+
+<p align="center">
+<img src="../assets/whisper.png" width="70%">
+</p>
+
+
diff --git a/whisper-playground/client/Containerfile b/audio-to-text/client/Containerfile
similarity index 100%
rename from whisper-playground/client/Containerfile
rename to audio-to-text/client/Containerfile
diff --git a/whisper-playground/client/requirements.txt b/audio-to-text/client/requirements.txt
similarity index 100%
rename from whisper-playground/client/requirements.txt
rename to audio-to-text/client/requirements.txt
diff --git a/whisper-playground/client/whisper_client.py b/audio-to-text/client/whisper_client.py
similarity index 100%
rename from whisper-playground/client/whisper_client.py
rename to audio-to-text/client/whisper_client.py
diff --git a/audio-to-text/quadlet/README.md b/audio-to-text/quadlet/README.md
new file mode 100644
index 000000000..2bebaef42
--- /dev/null
+++ b/audio-to-text/quadlet/README.md
@@ -0,0 +1,30 @@
+### Run audio-text locally as a podman pod
+
+There are pre-built images and a pod definition to run this audio-to-text example application.
+This sample converts an audio waveform (.wav) file to text.
+
+To run locally, 
+
+```bash
+podman kube play ./quadlet/audio-to-text.yaml
+```
+To monitor locally,
+
+```bash
+podman pod list
+podman ps 
+podman logs <name of container from the above>
+```
+
+The application should be acessible at `http://localhost:8501`. It will take a few minutes for the model to load. 
+
+### Run audio-text as a systemd service
+
+```bash
+cp audio-text.yaml /etc/containers/systemd/audio-text.yaml
+cp audio-text.kube.example /etc/containers/audio-text.kube
+cp audio-text.image /etc/containers/audio-text.image
+/usr/libexec/podman/quadlet --dryrun (optional)
+systemctl daemon-reload
+systemctl start audio-text
+```
diff --git a/audio-to-text/quadlet/audio-text.image b/audio-to-text/quadlet/audio-text.image
new file mode 100644
index 000000000..19d5fcc30
--- /dev/null
+++ b/audio-to-text/quadlet/audio-text.image
@@ -0,0 +1,7 @@
+[Install]
+WantedBy=audio-text.service
+
+[Image]
+Image=quay.io/redhat-et/locallm-whisper-ggml-small:latest
+Image=quay.io/redhat-et/locallm-whisper-service:latest
+Image=quay.io/redhat-et/locallm-audio-to-text:latest
diff --git a/audio-to-text/quadlet/audio-text.kube.example b/audio-to-text/quadlet/audio-text.kube.example
new file mode 100644
index 000000000..391408f31
--- /dev/null
+++ b/audio-to-text/quadlet/audio-text.kube.example
@@ -0,0 +1,16 @@
+[Unit]
+Description=Python script to run against downloaded LLM
+Documentation=man:podman-generate-systemd(1)
+Wants=network-online.target
+After=network-online.target
+RequiresMountsFor=%t/containers
+
+[Kube]
+# Point to the yaml file in the same directory
+Yaml=audio-text.yaml
+
+[Service]
+Restart=always
+
+[Install]
+WantedBy=default.target
diff --git a/audio-to-text/quadlet/audio-text.yaml b/audio-to-text/quadlet/audio-text.yaml
new file mode 100644
index 000000000..2307c4788
--- /dev/null
+++ b/audio-to-text/quadlet/audio-text.yaml
@@ -0,0 +1,45 @@
+apiVersion: v1
+kind: Pod
+metadata:
+  labels:
+    app: audio-to-text
+  name: audio-to-text
+spec:
+  initContainers:
+  - name: model-file
+    image: quay.io/redhat-et/locallm-whisper-ggml-small:latest
+    command: ['/usr/bin/install', "/model/ggml-small.bin", "/shared/"]
+    volumeMounts:
+    - name: model-file
+      mountPath: /shared
+  containers:
+  - env:
+    - name: MODEL_SERVICE_ENDPOINT
+      value: http://0.0.0.0:8001/inference
+    image: quay.io/redhat-et/locallm-audio-to-text:latest
+    name: audio-to-text
+    ports:
+    - containerPort: 8501
+      hostPort: 8501
+    securityContext:
+      runAsNonRoot: true
+  - env:
+    - name: HOST
+      value: 0.0.0.0
+    - name: PORT
+      value: 8001
+    - name: MODEL_PATH
+      value: /model/ggml-small.bin
+    image: quay.io/redhat-et/locallm-whisper-service:latest
+    name: whisper-model-service
+    ports:
+    - containerPort: 8001
+      hostPort: 8001
+    securityContext:
+      runAsNonRoot: true
+    volumeMounts:
+    - name: model-file
+      mountPath: /model
+  volumes:
+  - name: model-file
+    emptyDir: {}
diff --git a/whisper-playground/Containerfile b/model_servers/whispercpp/Containerfile
similarity index 100%
rename from whisper-playground/Containerfile
rename to model_servers/whispercpp/Containerfile
diff --git a/model_servers/whispercpp/README.md b/model_servers/whispercpp/README.md
new file mode 100644
index 000000000..f38c68395
--- /dev/null
+++ b/model_servers/whispercpp/README.md
@@ -0,0 +1,46 @@
+## Whisper
+
+Whisper models are useful for converting audio files to text. The sample application [audio-to-text](../audio-to-text/README.md)
+describes how to run an inference application. This document describes how to build a service for a Whisper model.
+
+### Build model service
+
+To build a Whisper model service container image from this directory,
+
+```bash
+podman build -t whisper:image .
+```
+
+### Download Whisper model
+
+You can to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found
+[here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB.
+
+- **small**
+    - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin)
+
+```bash
+cd ../models
+wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL>
+cd ../
+```
+
+### Deploy Model Service
+
+Deploy the LLM and volume mount the model of choice.
+Here, we are mounting the `ggml-small.bin` model as downloaded from above.
+
+```bash
+# Note: the :Z may need to be omitted from the model volume mount if not running on Linux
+
+podman run --rm -it \
+        -p 8001:8001 \
+        -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \
+        -e HOST=0.0.0.0 \
+        -e MODEL_PATH=/models/ggml-small.bin \
+        -e PORT=8001 \
+        whisper:image
+```
+
+By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with.
+The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image.
diff --git a/model_servers/whispercpp/run.sh b/model_servers/whispercpp/run.sh
new file mode 100644
index 000000000..7e640b762
--- /dev/null
+++ b/model_servers/whispercpp/run.sh
@@ -0,0 +1,4 @@
+#! bin/bash
+
+./server -tr --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001}
+
diff --git a/models/Containerfile b/models/Containerfile
index 981c85f44..e359bf7cb 100644
--- a/models/Containerfile
+++ b/models/Containerfile
@@ -1,6 +1,7 @@
 #https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf
 #https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf
 #https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf
+#https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
 # podman build --build-arg MODEL_URL=https://... -t quay.io/yourimage .
 FROM registry.access.redhat.com/ubi9/ubi-micro:9.3-13
 ARG MODEL_URL
diff --git a/playground/README.md b/playground/README.md
index 556c98a6e..cdc8caaed 100644
--- a/playground/README.md
+++ b/playground/README.md
@@ -69,4 +69,4 @@ podman run --rm -it -d \
         -v Local/path/to/locallm/models:/locallm/models:ro,Z \
         -e CONFIG_PATH=models/<config-filename> \
         playground:image
-```
\ No newline at end of file
+```
diff --git a/whisper-playground/README.md b/whisper-playground/README.md
deleted file mode 100644
index 519b6d6b1..000000000
--- a/whisper-playground/README.md
+++ /dev/null
@@ -1,77 +0,0 @@
-### Pre-Requisites
-
-If you are using an Apple MacBook M-series laptop, you will probably need to do the following configurations:
-
-* `brew tap cfergeau/crc`
-* `brew install vfkit`
-* `export CONTAINERS_MACHINE_PROVIDER=applehv`
-* Edit your `/Users/<your username>/.config/containers/containers.conf` file to include:
-```bash
-[machine]
-provider = "applehv"
-```
-* Ensure you have enough resources on your Podman machine. Recommended to have atleast `CPU: 8, Memory: 10 GB`
-
-### Build Model Service
-
-From this directory,
-
-```bash
-podman build -t whisper:image .
-```
-
-### Download Model
-
-We need to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found [here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB.
-
-- **small**
-    - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin)
-
-```bash
-cd ../models
-wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL>
-cd ../
-```
-
-### Download audio files
-
-Whisper.cpp requires as an input 16-bit WAV audio files.
-By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with.
-To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this:
-
-```bash
-ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>
-```
-
-The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image.
-
-### Deploy Model Service
-
-Deploy the LLM and volume mount the model of choice.
-Here, we are mounting the `ggml-small.bin` model as downloaded from above.
-
-```bash
-podman run --rm -it \
-        -p 8001:8001 \
-        -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \
-        -e HOST=0.0.0.0 \
-        -e PORT=8001 \
-        whisper:image
-```
-
-### Build and run the client application
-
-We will use Streamlit to create a front end application with which you can interact with the Whisper model through a simple UI.
-
-```bash
-podman build -t whisper_client whisper-playground/client
-```
-
-```bash
-podman run -p 8501:8501 -e MODEL_ENDPOINT=http://0.0.0.0:8000/inference whisper_client
-```
-Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`. From here, you can upload audio files from your local machine and translate the audio files as shown below.
-
-<p align="center">
-<img src="../assets/whisper.png" width="70%">
-</p>
diff --git a/whisper-playground/run.sh b/whisper-playground/run.sh
deleted file mode 100644
index 7fcd0a913..000000000
--- a/whisper-playground/run.sh
+++ /dev/null
@@ -1,3 +0,0 @@
-#! bin/bash
-
-./server -tr -m /models/ggml-small.bin --host ${HOST:=0.0.0.0} --port ${PORT:=8001}
\ No newline at end of file