Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image-analysis UI and refactor into multiple Helm charts #51

Merged
merged 34 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
82d045c
Refactor basic chat web app and add image analysis app
sd109 Oct 30, 2024
7ecfb92
Update image build matrix
sd109 Oct 30, 2024
1596c08
Disable change check for testing
sd109 Oct 30, 2024
6ff5623
Rename web apps and dockerfile targets
sd109 Oct 30, 2024
8707561
Revert "Disable change check for testing"
sd109 Oct 30, 2024
96ac909
Update image build workflow paths
sd109 Oct 30, 2024
afc6469
Update appSettings and related comments
sd109 Oct 30, 2024
58c4dcb
Rename published docker images
sd109 Oct 30, 2024
a8c5a77
Add chart test for image-analysis UI
sd109 Oct 30, 2024
b462043
Fixup failing chart tests
sd109 Oct 30, 2024
9cd5694
Update docs
sd109 Oct 31, 2024
c85bffd
Refactor Helm charts to allow a different chart schema per web app
sd109 Oct 31, 2024
40f9b33
Mount local overrides as volume for testing
sd109 Oct 31, 2024
0f6882c
Clean up and formatting
sd109 Oct 31, 2024
b8737bc
Rename CI test values files
sd109 Oct 31, 2024
9dfcf3a
Update Azimuth UI config
sd109 Oct 31, 2024
00972e2
Fix chart dependency versions
sd109 Oct 31, 2024
6020434
Fix scheme for passing custom LLM params
sd109 Oct 31, 2024
456b221
Fix scheme for passing custom LLM params
sd109 Oct 31, 2024
f5c8654
Add model context length option to Azimuth UI
sd109 Oct 31, 2024
8e58066
Fix defaults for LLMParams data model
sd109 Oct 31, 2024
ef83288
Bump UI image tag
sd109 Oct 31, 2024
35a1438
Bump UI image tag
sd109 Oct 31, 2024
f3d5544
Remove top_k from vision model UI options
sd109 Oct 31, 2024
0dfd58a
Update workflow permissions avoid device-flow auth
sd109 Oct 31, 2024
bd6accb
Reable change detection on image builds
sd109 Oct 31, 2024
d6202c0
Remove redundant helm template check
sd109 Oct 31, 2024
96ca80a
Skip change detection on tags
sd109 Oct 31, 2024
24b4f2c
Always run artifact publishing on tags
sd109 Oct 31, 2024
df7b8eb
Revert "Skip change detection on tags"
sd109 Oct 31, 2024
f9eb0aa
Remove unused reloader annotation
sd109 Oct 31, 2024
d0d1fd0
Test alternative docker cache settings
sd109 Oct 31, 2024
7f91553
Dummy change for cache testing
sd109 Oct 31, 2024
717e582
Revert to master branch of docker build action
sd109 Nov 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 15 additions & 7 deletions .github/workflows/build-push-artifacts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,24 @@ jobs:
images:
- 'web-apps/**'
chart:
- 'chart/**'
- 'charts/**'

# Job to build container images
build_push_images:
name: Build and push images
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write # needed for signing the images with GitHub OIDC Token
packages: write # required for pushing container images
security-events: write # required for pushing SARIF files
needs: changes
if: ${{ needs.changes.outputs.images == 'true' || github.ref_type == 'tag' }}
if: ${{ github.ref_type == 'tag' || needs.changes.outputs.images == 'true' }}
strategy:
matrix:
include:
- component: chat-interface
- component: chat
- component: image-analysis
steps:
- name: Check out the repository
uses: actions/checkout@v4
Expand All @@ -55,18 +61,19 @@ jobs:
id: image-meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/stackhpc/azimuth-llm-${{ matrix.component }}
images: ghcr.io/stackhpc/azimuth-llm-${{ matrix.component }}-ui
# Produce the branch name or tag and the SHA as tags
tags: |
type=ref,event=branch
type=ref,event=tag
type=sha,prefix=

- name: Build and push image
uses: azimuth-cloud/github-actions/docker-multiarch-build-push@update-trivy-action
uses: azimuth-cloud/github-actions/docker-multiarch-build-push@master
with:
cache-key: ${{ matrix.component }}
context: ./web-apps/${{ matrix.component }}
context: ./web-apps/
file: ./web-apps/${{ matrix.component }}/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.image-meta.outputs.tags }}
Expand All @@ -78,7 +85,7 @@ jobs:
runs-on: ubuntu-latest
# Only build and push the chart if chart files have changed
needs: [changes]
if: ${{ needs.changes.outputs.chart == 'true' || github.ref_type == 'tag' }}
if: ${{ github.ref_type == 'tag' || needs.changes.outputs.chart == 'true' }}
steps:
- name: Check out the repository
uses: actions/checkout@v4
Expand All @@ -94,6 +101,7 @@ jobs:
- name: Publish Helm charts
uses: azimuth-cloud/github-actions/helm-publish@master
with:
directory: charts
token: ${{ secrets.GITHUB_TOKEN }}
version: ${{ steps.semver.outputs.version }}
app-version: ${{ steps.semver.outputs.short-sha }}
4 changes: 0 additions & 4 deletions .github/workflows/test-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,6 @@ jobs:
- name: Run chart linting
run: ct lint --config ct.yaml

- name: Run helm template with default values
run: helm template ci-test .
working-directory: chart

- name: Create Kind Cluster
uses: helm/kind-action@v1
with:
Expand Down
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,11 @@ test-values.y[a]ml
**venv*/

# Helm chart stuff
chart/Chart.lock
chart/charts
charts/*/Chart.lock
charts/*/charts

# Python stuff
**/build/
**/*.egg-info/
**/flagged/
web-apps/**/overrides.yml
34 changes: 18 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,38 +34,36 @@ ui:
enabled: false
```

***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. It is up to you to secure the running service in your own way. In contrast, when deploying via Azimuth, authentication is provided via the standard Azimuth Identity Provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
[!WARNING] Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. It is up to you to secure the running service as appropriate for your use case. In contrast, when deployed via Azimuth, authentication is provided via the standard Azimuth Identity Provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).

The UI can also optionally be exposed using a Kubernetes Ingress resource. See the `ui.ingress` section in `values.yml` for available config options.
The both the web-based interface and the backend OpenAI-compatible vLLM API server can also optionally be exposed using [Kubernetes Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). See the `ingress` section in `values.yml` for available config options.

## Tested Models

The following is a non-exhaustive list of models which have been tested with this app:
- [Llama 2 7B chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
- [AWQ Quantized Llama 2 70B](https://huggingface.co/TheBloke/Llama-2-70B-Chat-AWQ)
- [Magicoder 6.7B](https://huggingface.co/ise-uiuc/Magicoder-S-DS-6.7B)
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
- [WizardCoder Python 34B](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)
- [AWQ Quantized Mixtral 8x7B Instruct v0.1](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ)
The application uses [vLLM](https://docs.vllm.ai/en/latest/index.html) for model serving, therefore any of the vLLM [supported models](https://docs.vllm.ai/en/latest/models/supported_models.html) should work. Since vLLM pulls the model files directly from [HuggingFace](https://huggingface.co/models) it is likely that some other models will also be compatible with vLLM but mileage may vary between models and model architectures. If a model is incompatible with vLLM then the API pod will likely enter a `CrashLoopBackoff` state and any relevant error information will be found in the API pod logs. These logs can be viewed with

Due to the combination of [components](##Components) used in this app, some HuggingFace models may not work as expected (usually due to the way in which LangChain formats the prompt messages). Any errors when using a new model will appear in the logs for either the web-app pod or the backend API pod. Please open an issue if you would like explicit support for a specific model that is not in the above list.
```
kubectl (-n <helm-release-namespace>) logs deploy/<helm-release-name>-api
```

If you suspect that a given error is not caused by the upstream vLLM support and a problem with this Helm chart then please [open an issue](https://github.com/stackhpc/azimuth-llm/issues).

## Monitoring

The LLM chart integrates with [kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack) by creating a `ServiceMonitor` resource and installing a custom Grafana dashboard as a Kubernetes `ConfigMap`. If the target cluster has an existing `kube-prometheus-stack` deployment which is appropriately configured to watch all namespaces for new Grafana dashboards, the custom LLM dashboard provided here will automatically picked up by Grafana. It will appear in the Grafana dashboard list with the name 'LLM dashboard'.
The LLM chart integrates with [kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack) by creating a `ServiceMonitor` resource and installing two custom Grafana dashboard as Kubernetes `ConfigMap`s. If the target cluster has an existing `kube-prometheus-stack` deployment which is appropriately configured to watch all namespaces for new Grafana dashboards, the LLM dashboards will automatically appear in Grafana's dashboard list.

To disable the monitoring integrations, set the `api.monitoring.enabled` value to `false`.

## Components

The Helm chart consists of the following components:
- A backend web API which runs [vLLM](https://github.com/vllm-project/vllm)'s [OpenAI compatible web server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server).
- A backend web API which runs [vLLM](https://github.com/vllm-project/vllm)'s [OpenAI compatible web server](https://docs.vllm.ai/en/stable/getting_started/quickstart.html#openai-compatible-server).

- A frontend web-app built using [Gradio](https://www.gradio.app) and [LangChain](https://www.langchain.com). The web app source code can be found in `chart/web-app` and gets written to a ConfigMap during the chart build and is then mounted into the UI pod and executed as the entry point for the UI docker image (built from `images/ui-base/Dockerfile`).
- A choice of frontend web-apps built using [Gradio](https://www.gradio.app) (see [web-apps](./web-apps/)). Each web interface is available as a pre-built container image [hosted on ghcr.io](https://github.com/orgs/stackhpc/packages?repo_name=azimuth-llm) and be configured for each Helm release by changing the `ui.image` section of the chart values.

- A [stakater/Reloader](https://github.com/stakater/Reloader) instance which monitors the web-app ConfigMap for changes and restarts the frontend when the app code changes (i.e. whenever the Helm values are updated).
<!-- ## Development

## Development
TODO: Update this

The GitHub repository includes a [tilt](https://tilt.dev) file for easier development. After installing tilt locally, simply run `tilt up` from the repo root to get started with development. This will trigger the following:

Expand All @@ -77,4 +75,8 @@ The GitHub repository includes a [tilt](https://tilt.dev) file for easier develo

- Launch the frontend web app locally on `127.0.0.1:7860`, configured to use `localhost:8080` as the backend API

- Watch all components and only reload the minimal set of components needed when a file in the repo changes (e.g. modifying `chart/web-app/app.py` will restart the local web app instance only)
- Watch all components and only reload the minimal set of components needed when a file in the repo changes (e.g. modifying `chart/web-app/app.py` will restart the local web app instance only) -->

<!-- ## Adding a new web interface

TODO: Write these docs... -->
34 changes: 0 additions & 34 deletions chart/azimuth-ui.schema.yaml

This file was deleted.

124 changes: 0 additions & 124 deletions chart/values.schema.json

This file was deleted.

22 changes: 22 additions & 0 deletions charts/azimuth-chat/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: v2
name: azimuth-llm-chat
description: HuggingFace vision model serving along with a simple web interface.
maintainers:
- name: "Scott Davidson"
email: [email protected]

type: application

version: 0.1.0

appVersion: "0.1.0"

icon: https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.svg

annotations:
azimuth.stackhpc.com/label: HuggingFace Image Analysis

dependencies:
- name: azimuth-llm
version: ">=0-0"
repository: "file://../azimuth-llm/"
33 changes: 33 additions & 0 deletions charts/azimuth-chat/azimuth-ui.schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
controls:
/azimuth-llm/huggingface/model:
type: TextControl
required: true
/azimuth-llm/huggingface/token:
type: TextControl
secret: true
# Use mirror to mimic yaml anchor in base Helm chart
/azimuth-llm/ui/appSettings/model_name:
type: MirrorControl
path: /azimuth-llm/huggingface/model
visuallyHidden: true
# Azimuth UI doesn't handle json type ["integer","null"]
# properly so we allow any type in JSON schema then
# constrain to (optional) integer here.
/azimuth-llm/api/modelMaxContextLength:
type: IntegerControl
minimum: 100
required: false

sortOrder:
- /azimuth-llm/huggingface/model
- /azimuth-llm/huggingface/token
- /azimuth-llm/ui/appSettings/model_instruction
- /azimuth-llm/ui/appSettings/page_title
- /azimuth-llm/api/image/version
- /azimuth-llm/ui/appSettings/llm_params/temperature
- /azimuth-llm/ui/appSettings/llm_params/max_tokens
- /azimuth-llm/ui/appSettings/llm_params/frequency_penalty
- /azimuth-llm/ui/appSettings/llm_params/presence_penalty
- /azimuth-llm/ui/appSettings/llm_params/top_p
- /azimuth-llm/ui/appSettings/llm_params/top_k
- /azimuth-llm/api/modelMaxContextLength
16 changes: 16 additions & 0 deletions charts/azimuth-chat/ci/ui-only-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
azimuth-llm:
api:
enabled: false
ui:
service:
zenith:
enabled: false
appSettings:
# Verify that we can set non-standard LLM params
llm_params:
max_tokens: 101
temperature: 0.1
top_p: 0.15
top_k: 1
presence_penalty: 0.9
frequency_penalty: 1
Loading
Loading