From 8977f88388909a384104766764529c79fc8da959 Mon Sep 17 00:00:00 2001
From: Evzen Gasta <evzen.ml@seznam.cz>
Date: Fri, 1 Nov 2024 07:24:22 +0100
Subject: [PATCH] feat: actualized recommendations

Signed-off-by: Evzen Gasta <evzen.ml@seznam.cz>
---
 packages/backend/src/assets/ai.json | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/packages/backend/src/assets/ai.json b/packages/backend/src/assets/ai.json
index 2ed0ebe49..e8bac8800 100644
--- a/packages/backend/src/assets/ai.json
+++ b/packages/backend/src/assets/ai.json
@@ -11,7 +11,11 @@
       "categories": ["natural-language-processing"],
       "basedir": "recipes/natural_language_processing/chatbot",
       "readme": "# Chat Application\n\n  This recipe helps developers start building their own custom LLM enabled chat applications. It consists of two main components: the Model Service and the AI Application.\n\n  There are a few options today for local Model Serving, but this recipe will use [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, [`model_servers/llamacpp_python/base/Containerfile`](/model_servers/llamacpp_python/base/Containerfile).\n\n  The AI Application will connect to the Model Service via its OpenAI compatible API. The recipe relies on [Langchain's](https://python.langchain.com/docs/get_started/introduction) python package to simplify communication with the Model Service and uses [Streamlit](https://streamlit.io/) for the UI layer. You can find an example of the chat application below.\n\n![](/assets/chatbot_ui.png) \n\n\n## Try the Chat Application\n\nThe [Podman Desktop](https://podman-desktop.io) [AI Lab Extension](https://github.com/containers/podman-desktop-extension-ai-lab) includes this recipe among others. To try it out, open `Recipes Catalog` -> `Chatbot` and follow the instructions to start the application.\n\n# Build the Application\n\nThe rest of this document will explain how to build and run the application from the terminal, and will\ngo into greater detail on how each container in the Pod above is built, run, and \nwhat purpose it serves in the overall application. All the recipes use a central [Makefile](../../common/Makefile.common) that includes variables populated with default values to simplify getting started. Please review the [Makefile docs](../../common/README.md), to learn about further customizing your application.\n\n\nThis application requires a model, a model service and an AI inferencing application.\n\n* [Quickstart](#quickstart)\n* [Download a model](#download-a-model)\n* [Build the Model Service](#build-the-model-service)\n* [Deploy the Model Service](#deploy-the-model-service)\n* [Build the AI Application](#build-the-ai-application)\n* [Deploy the AI Application](#deploy-the-ai-application)\n* [Interact with the AI Application](#interact-with-the-ai-application)\n* [Embed the AI Application in a Bootable Container Image](#embed-the-ai-application-in-a-bootable-container-image)\n\n\n## Quickstart\nTo run the application with pre-built images from `quay.io/ai-lab`, use `make quadlet`. This command\nbuilds the application's metadata and generates Kubernetes YAML at `./build/chatbot.yaml` to spin up a Pod that can then be launched locally.\nTry it with:\n\n```\nmake quadlet\npodman kube play build/chatbot.yaml\n```\n\nThis will take a few minutes if the model and model-server container images need to be downloaded. \nThe Pod is named `chatbot`, so you may use [Podman](https://podman.io) to manage the Pod and its containers:\n\n```\npodman pod list\npodman ps\n```\n\nOnce the Pod and its containers are running, the application can be accessed at `http://localhost:8501`. \nPlease refer to the section below for more details about [interacting with the chatbot application](#interact-with-the-ai-application).\n\nTo stop and remove the Pod, run:\n\n```\npodman pod stop chatbot\npodman pod rm chatbot\n```\n\n## Download a model\n\nIf you are just getting started, we recommend using [granite-7b-lab](https://huggingface.co/instructlab/granite-7b-lab). This is a well\nperformant mid-sized model with an apache-2.0 license. In order to use it with our Model Service we need it converted\nand quantized into the [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md). There are a number of\nways to get a GGUF version of granite-7b-lab, but the simplest is to download a pre-converted one from\n[huggingface.co](https://huggingface.co) here: https://huggingface.co/instructlab/granite-7b-lab-GGUF.\n\nThe recommended model can be downloaded using the code snippet below:\n\n```bash\ncd ../../../models\ncurl -sLO https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf\ncd ../recipes/natural_language_processing/chatbot\n```\n\n_A full list of supported open models is forthcoming._  \n\n\n## Build the Model Service\n\nThe complete instructions for building and deploying the Model Service can be found in the\n[llamacpp_python model-service document](../../../model_servers/llamacpp_python/README.md).\n\nThe Model Service can be built from make commands from the [llamacpp_python directory](../../../model_servers/llamacpp_python/).\n\n```bash\n# from path model_servers/llamacpp_python from repo containers/ai-lab-recipes\nmake build\n```\nCheckout the [Makefile](../../../model_servers/llamacpp_python/Makefile) to get more details on different options for how to build.\n\n## Deploy the Model Service\n\nThe local Model Service relies on a volume mount to the localhost to access the model files. It also employs environment variables to dictate the model used and where its served. You can start your local Model Service using the following `make` command from `model_servers/llamacpp_python` set with reasonable defaults:\n\n```bash\n# from path model_servers/llamacpp_python from repo containers/ai-lab-recipes\nmake run\n```\n\n## Build the AI Application\n\nThe AI Application can be built from the make command:\n\n```bash\n# Run this from the current directory (path recipes/natural_language_processing/chatbot from repo containers/ai-lab-recipes)\nmake build\n```\n\n## Deploy the AI Application\n\nMake sure the Model Service is up and running before starting this container image. When starting the AI Application container image we need to direct it to the correct `MODEL_ENDPOINT`. This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. In our case the Model Service is running inside the Podman machine so we need to provide it with the appropriate address `10.88.0.1`. To deploy the AI application use the following:\n\n```bash\n# Run this from the current directory (path recipes/natural_language_processing/chatbot from repo containers/ai-lab-recipes)\nmake run \n```\n\n## Interact with the AI Application\n\nEverything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501). By using this recipe and getting this starting point established, users should now have an easier time customizing and building their own LLM enabled chatbot applications.   \n\n## Embed the AI Application in a Bootable Container Image\n\nTo build a bootable container image that includes this sample chatbot workload as a service that starts when a system is booted, run: `make -f Makefile bootc`. You can optionally override the default image / tag you want to give the make command by specifying it as follows: `make -f Makefile BOOTC_IMAGE=<your_bootc_image> bootc`.\n\nSubstituting the bootc/Containerfile FROM command is simple using the Makefile FROM option.\n\n```bash\nmake FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 bootc\n```\n\nSelecting the ARCH for the bootc/Containerfile is simple using the Makefile ARCH= variable.\n\n```\nmake ARCH=x86_64 bootc\n```\n\nThe magic happens when you have a bootc enabled system running. If you do, and you'd like to update the operating system to the OS you just built\nwith the chatbot application, it's as simple as ssh-ing into the bootc system and running:\n\n```bash\nbootc switch quay.io/ai-lab/chatbot-bootc:latest\n```\n\nUpon a reboot, you'll see that the chatbot service is running on the system. Check on the service with:\n\n```bash\nssh user@bootc-system-ip\nsudo systemctl status chatbot\n```\n\n### What are bootable containers?\n\nWhat's a [bootable OCI container](https://containers.github.io/bootc/) and what's it got to do with AI?\n\nThat's a good question! We think it's a good idea to embed AI workloads (or any workload!) into bootable images at _build time_ rather than\nat _runtime_. This extends the benefits, such as portability and predictability, that containerizing applications provides to the operating system.\nBootable OCI images bake exactly what you need to run your workloads into the operating system at build time by using your favorite containerization\ntools. Might I suggest [podman](https://podman.io/)?\n\nOnce installed, a bootc enabled system can be updated by providing an updated bootable OCI image from any OCI\nimage registry with a single `bootc` command. This works especially well for fleets of devices that have fixed workloads - think\nfactories or appliances. Who doesn't want to add a little AI to their appliance, am I right?\n\nBootable images lend toward immutable operating systems, and the more immutable an operating system is, the less that can go wrong at runtime!\n\n#### Creating bootable disk images\n\nYou can convert a bootc image to a bootable disk image using the\n[quay.io/centos-bootc/bootc-image-builder](https://github.com/osbuild/bootc-image-builder) container image.\n\nThis container image allows you to build and deploy [multiple disk image types](../../common/README_bootc_image_builder.md) from bootc container images.\n\nDefault image types can be set via the DISK_TYPE Makefile variable.\n\n`make bootc-image-builder DISK_TYPE=ami`\n",
-      "recommended": ["hf.instructlab.granite-7b-lab-GGUF", "hf.instructlab.merlinite-7b-lab-GGUF"],
+      "recommended": [
+        "hf.instructlab.granite-7b-lab-GGUF",
+        "hf.instructlab.merlinite-7b-lab-GGUF",
+        "hf.lmstudio-community.granite-3.0-8b-instruct-GGUF"
+      ],
       "backend": "llama-cpp"
     },
     {
@@ -24,7 +28,11 @@
       "categories": ["natural-language-processing"],
       "basedir": "recipes/natural_language_processing/summarizer",
       "readme": "# Text Summarizer Application\n\n  This recipe helps developers start building their own custom LLM enabled summarizer applications. It consists of two main components: the Model Service and the AI Application.\n\n  There are a few options today for local Model Serving, but this recipe will use [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, [`model_servers/llamacpp_python/base/Containerfile`](/model_servers/llamacpp_python/base/Containerfile).\n\n  The AI Application will connect to the Model Service via its OpenAI compatible API. The recipe relies on [Langchain's](https://python.langchain.com/docs/get_started/introduction) python package to simplify communication with the Model Service and uses [Streamlit](https://streamlit.io/) for the UI layer. You can find an example of the summarizer application below.\n\n![](/assets/summarizer_ui.png) \n\n\n## Try the Summarizer Application\n\nThe [Podman Desktop](https://podman-desktop.io) [AI Lab Extension](https://github.com/containers/podman-desktop-extension-ai-lab) includes this recipe among others. To try it out, open `Recipes Catalog` -> `Summarizer` and follow the instructions to start the application.\n\n# Build the Application\n\nThe rest of this document will explain how to build and run the application from the terminal, and will\ngo into greater detail on how each container in the Pod above is built, run, and \nwhat purpose it serves in the overall application. All the recipes use a central [Makefile](../../common/Makefile.common) that includes variables populated with default values to simplify getting started. Please review the [Makefile docs](../../common/README.md), to learn about further customizing your application.\n\n\nThis application requires a model, a model service and an AI inferencing application.\n\n* [Quickstart](#quickstart)\n* [Download a model](#download-a-model)\n* [Build the Model Service](#build-the-model-service)\n* [Deploy the Model Service](#deploy-the-model-service)\n* [Build the AI Application](#build-the-ai-application)\n* [Deploy the AI Application](#deploy-the-ai-application)\n* [Interact with the AI Application](#interact-with-the-ai-application)\n* [Embed the AI Application in a Bootable Container Image](#embed-the-ai-application-in-a-bootable-container-image)\n\n\n## Quickstart\nTo run the application with pre-built images from `quay.io/ai-lab`, use `make quadlet`. This command\nbuilds the application's metadata and generates Kubernetes YAML at `./build/summarizer.yaml` to spin up a Pod that can then be launched locally.\nTry it with:\n\n```\nmake quadlet\npodman kube play build/summarizer.yaml\n```\n\nThis will take a few minutes if the model and model-server container images need to be downloaded. \nThe Pod is named `summarizer`, so you may use [Podman](https://podman.io) to manage the Pod and its containers:\n\n```\npodman pod list\npodman ps\n```\n\nOnce the Pod and its containers are running, the application can be accessed at `http://localhost:8501`. \nPlease refer to the section below for more details about [interacting with the summarizer application](#interact-with-the-ai-application).\n\nTo stop and remove the Pod, run:\n\n```\npodman pod stop summarizer\npodman pod rm summarizer\n```\n\n## Download a model\n\nIf you are just getting started, we recommend using [granite-7b-lab](https://huggingface.co/instructlab/granite-7b-lab). This is a well\nperformant mid-sized model with an apache-2.0 license. In order to use it with our Model Service we need it converted\nand quantized into the [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md). There are a number of\nways to get a GGUF version of granite-7b-lab, but the simplest is to download a pre-converted one from\n[huggingface.co](https://huggingface.co) here: https://huggingface.co/instructlab/granite-7b-lab-GGUF/blob/main/granite-7b-lab-Q4_K_M.gguf.\n\nThe recommended model can be downloaded using the code snippet below:\n\n```bash\ncd ../../../models\ncurl -sLO https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf\ncd ../recipes/natural_language_processing/summarizer\n```\n\n_A full list of supported open models is forthcoming._  \n\n\n## Build the Model Service\n\nThe complete instructions for building and deploying the Model Service can be found in the\n[llamacpp_python model-service document](../../../model_servers/llamacpp_python/README.md).\n\nThe Model Service can be built from make commands from the [llamacpp_python directory](../../../model_servers/llamacpp_python/).\n\n```bash\n# from path model_servers/llamacpp_python from repo containers/ai-lab-recipes\nmake build\n```\nCheckout the [Makefile](../../../model_servers/llamacpp_python/Makefile) to get more details on different options for how to build.\n\n## Deploy the Model Service\n\nThe local Model Service relies on a volume mount to the localhost to access the model files. It also employs environment variables to dictate the model used and where its served. You can start your local Model Service using the following `make` command from `model_servers/llamacpp_python` set with reasonable defaults:\n\n```bash\n# from path model_servers/llamacpp_python from repo containers/ai-lab-recipes\nmake run\n```\n\n## Build the AI Application\n\nThe AI Application can be built from the make command:\n\n```bash\n# Run this from the current directory (path recipes/natural_language_processing/summarizer from repo containers/ai-lab-recipes)\nmake build\n```\n\n## Deploy the AI Application\n\nMake sure the Model Service is up and running before starting this container image. When starting the AI Application container image we need to direct it to the correct `MODEL_ENDPOINT`. This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. In our case the Model Service is running inside the Podman machine so we need to provide it with the appropriate address `10.88.0.1`. To deploy the AI application use the following:\n\n```bash\n# Run this from the current directory (path recipes/natural_language_processing/summarizer from repo containers/ai-lab-recipes)\nmake run \n```\n\n## Interact with the AI Application\n\nEverything should now be up an running with the summarizer application available at [`http://localhost:8501`](http://localhost:8501). By using this recipe and getting this starting point established, users should now have an easier time customizing and building their own LLM enabled summarizer applications.   \n\n## Embed the AI Application in a Bootable Container Image\n\nTo build a bootable container image that includes this sample summarizer workload as a service that starts when a system is booted, run: `make -f Makefile bootc`. You can optionally override the default image / tag you want to give the make command by specifying it as follows: `make -f Makefile BOOTC_IMAGE=<your_bootc_image> bootc`.\n\nSubstituting the bootc/Containerfile FROM command is simple using the Makefile FROM option.\n\n```bash\nmake FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 bootc\n```\n\nSelecting the ARCH for the bootc/Containerfile is simple using the Makefile ARCH= variable.\n\n```\nmake ARCH=x86_64 bootc\n```\n\nThe magic happens when you have a bootc enabled system running. If you do, and you'd like to update the operating system to the OS you just built\nwith the summarizer application, it's as simple as ssh-ing into the bootc system and running:\n\n```bash\nbootc switch quay.io/ai-lab/summarizer-bootc:latest\n```\n\nUpon a reboot, you'll see that the summarizer service is running on the system. Check on the service with:\n\n```bash\nssh user@bootc-system-ip\nsudo systemctl status summarizer\n```\n\n### What are bootable containers?\n\nWhat's a [bootable OCI container](https://containers.github.io/bootc/) and what's it got to do with AI?\n\nThat's a good question! We think it's a good idea to embed AI workloads (or any workload!) into bootable images at _build time_ rather than\nat _runtime_. This extends the benefits, such as portability and predictability, that containerizing applications provides to the operating system.\nBootable OCI images bake exactly what you need to run your workloads into the operating system at build time by using your favorite containerization\ntools. Might I suggest [podman](https://podman.io/)?\n\nOnce installed, a bootc enabled system can be updated by providing an updated bootable OCI image from any OCI\nimage registry with a single `bootc` command. This works especially well for fleets of devices that have fixed workloads - think\nfactories or appliances. Who doesn't want to add a little AI to their appliance, am I right?\n\nBootable images lend toward immutable operating systems, and the more immutable an operating system is, the less that can go wrong at runtime!\n\n#### Creating bootable disk images\n\nYou can convert a bootc image to a bootable disk image using the\n[quay.io/centos-bootc/bootc-image-builder](https://github.com/osbuild/bootc-image-builder) container image.\n\nThis container image allows you to build and deploy [multiple disk image types](../../common/README_bootc_image_builder.md) from bootc container images.\n\nDefault image types can be set via the DISK_TYPE Makefile variable.\n\n`make bootc-image-builder DISK_TYPE=ami`\n",
-      "recommended": ["hf.instructlab.granite-7b-lab-GGUF", "hf.instructlab.merlinite-7b-lab-GGUF"],
+      "recommended": [
+        "hf.instructlab.granite-7b-lab-GGUF",
+        "hf.instructlab.merlinite-7b-lab-GGUF",
+        "hf.lmstudio-community.granite-3.0-8b-instruct-GGUF"
+      ],
       "backend": "llama-cpp"
     },
     {
@@ -54,7 +62,11 @@
       "categories": ["natural-language-processing"],
       "basedir": "recipes/natural_language_processing/rag",
       "readme": "# RAG (Retrieval Augmented Generation) Chat Application\n\nThis demo provides a simple recipe to help developers start to build out their own custom RAG (Retrieval Augmented Generation) applications. It consists of three main components; the Model Service, the Vector Database and the AI Application.\n\nThere are a few options today for local Model Serving, but this recipe will use [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, [`model_servers/llamacpp_python/base/Containerfile`](/model_servers/llamacpp_python/base/Containerfile).\n\nIn order for the LLM to interact with our documents, we need them stored and available in such a manner that we can retrieve a small subset of them that are relevant to our query. To do this we employ a Vector Database alongside an embedding model. The embedding model converts our documents into numerical representations, vectors, such that similarity searches can be easily performed. The Vector Database stores these vectors for us and makes them available to the LLM. In this recipe we can use [chromaDB](https://docs.trychroma.com/) or [Milvus](https://milvus.io/) as our Vector Database.\n\nOur AI Application will connect to our Model Service via it's OpenAI compatible API. In this example we rely on [Langchain's](https://python.langchain.com/docs/get_started/introduction) python package to simplify communication with our Model Service and we use [Streamlit](https://streamlit.io/) for our UI layer. Below please see an example of the RAG application.     \n\n![](/assets/rag_ui.png)\n\n\n## Try the RAG chat application\n\n_COMING SOON to AI LAB_\nThe [Podman Desktop](https://podman-desktop.io) [AI Lab Extension](https://github.com/containers/podman-desktop-extension-ai-lab) includes this recipe among others. To try it out, open `Recipes Catalog` -> `RAG Chatbot` and follow the instructions to start the application.\n\nIf you prefer building and running the application from terminal, please run the following commands from this directory.\n\nFirst, build application's meta data and run the generated Kubernetes YAML which will spin up a Pod along with a number of containers:\n```\nmake quadlet\npodman kube play build/rag.yaml\n```\n\nThe Pod is named `rag`, so you may use [Podman](https://podman.io) to manage the Pod and its containers:\n```\npodman pod list\npodman ps\n```\n\nTo stop and remove the Pod, run:\n```\npodman pod stop rag\npodman pod rm rag\n```\n\nOnce the Pod is running, please refer to the section below to [interact with the RAG chatbot application](#interact-with-the-ai-application).\n\n# Build the Application\n\nIn order to build this application we will need two models, a Vector Database, a Model Service and an AI Application.  \n\n* [Download models](#download-models)\n* [Deploy the Vector Database](#deploy-the-vector-database)\n* [Build the Model Service](#build-the-model-service)\n* [Deploy the Model Service](#deploy-the-model-service)\n* [Build the AI Application](#build-the-ai-application)\n* [Deploy the AI Application](#deploy-the-ai-application)\n* [Interact with the AI Application](#interact-with-the-ai-application)\n\n### Download models\n\nIf you are just getting started, we recommend using [Granite-7B-Lab](https://huggingface.co/instructlab/granite-7b-lab-GGUF). This is a well\nperformant mid-sized model with an apache-2.0 license that has been quanitzed and served into the [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md).\n\nThe recommended model can be downloaded using the code snippet below:\n\n```bash\ncd ../../../models\ncurl -sLO https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf\ncd ../recipes/natural_language_processing/rag\n```\n\n_A full list of supported open models is forthcoming._  \n\nIn addition to the LLM, RAG applications also require an embedding model to convert documents between natural language and vector representations. For this demo we will use [`BAAI/bge-base-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5) it is a fairly standard model for this use case and has an MIT license.    \n\nThe code snippet below can be used to pull a copy of the `BAAI/bge-base-en-v1.5` embedding model and store it in your `models/` directory. \n\n```python \nfrom huggingface_hub import snapshot_download\nsnapshot_download(repo_id=\"BAAI/bge-base-en-v1.5\",\n                cache_dir=\"models/\",\n                local_files_only=False)\n```\n\n### Deploy the Vector Database \n\nTo deploy the Vector Database service locally, simply use the existing ChromaDB or Milvus image. The Vector Database is ephemeral and will need to be re-populated each time the container restarts. When implementing RAG in production, you will want a long running and backed up Vector Database.\n\n\n#### ChromaDB\n```bash\npodman pull chromadb/chroma\n```\n```bash\npodman run --rm -it -p 8000:8000 chroma\n```\n#### Milvus\n```bash\npodman pull milvusdb/milvus:master-20240426-bed6363f\n```\n```bash\npodman run -it \\\n        --name milvus-standalone \\\n        --security-opt seccomp:unconfined \\\n        -e ETCD_USE_EMBED=true \\\n        -e ETCD_CONFIG_PATH=/milvus/configs/embedEtcd.yaml \\\n        -e COMMON_STORAGETYPE=local \\\n        -v $(pwd)/volumes/milvus:/var/lib/milvus \\\n        -v $(pwd)/embedEtcd.yaml:/milvus/configs/embedEtcd.yaml \\\n        -p 19530:19530 \\\n        -p 9091:9091 \\\n        -p 2379:2379 \\\n        --health-cmd=\"curl -f http://localhost:9091/healthz\" \\\n        --health-interval=30s \\\n        --health-start-period=90s \\\n        --health-timeout=20s \\\n        --health-retries=3 \\\n        milvusdb/milvus:master-20240426-bed6363f \\\n        milvus run standalone  1> /dev/null\n```\nNote: For running the Milvus instance, make sure you have the `$(pwd)/volumes/milvus` directory and `$(pwd)/embedEtcd.yaml` file as shown in this repository. These are required by the database for its operations.\n\n\n### Build the Model Service\n\nThe complete instructions for building and deploying the Model Service can be found in the [the llamacpp_python model-service document](../model_servers/llamacpp_python/README.md).\n\nThe Model Service can be built with the following code snippet:\n\n```bash\ncd model_servers/llamacpp_python\npodman build -t llamacppserver -f ./base/Containerfile .\n```\n\n\n### Deploy the Model Service\n\nThe complete instructions for building and deploying the Model Service can be found in the [the llamacpp_python model-service document](../model_servers/llamacpp_python/README.md).\n\nThe local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following Podman command:\n```\npodman run --rm -it \\\n        -p 8001:8001 \\\n        -v Local/path/to/locallm/models:/locallm/models \\\n        -e MODEL_PATH=models/<model-filename> \\\n        -e HOST=0.0.0.0 \\\n        -e PORT=8001 \\\n        llamacppserver\n```\n\n### Build the AI Application\n\nNow that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application image in the `rag-langchain/` directory.\n\n```bash\ncd rag\nmake APP_IMAGE=rag build\n```\n\n### Deploy the AI Application\n\nMake sure the Model Service and the Vector Database are up and running before starting this container image. When starting the AI Application container image we need to direct it to the correct `MODEL_ENDPOINT`. This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. In our case the Model Service is running inside the Podman machine so we need to provide it with the appropriate address `10.88.0.1`. The same goes for the Vector Database. Make sure the `VECTORDB_HOST` is correctly set to `10.88.0.1` for communication within the Podman virtual machine.\n\nThere also needs to be a volume mount into the `models/` directory so that the application can access the embedding model as well as a volume mount into the `data/` directory where it can pull documents from to populate the Vector Database.  \n\nThe following Podman command can be used to run your AI Application:\n\n```bash\npodman run --rm -it -p 8501:8501 \\\n-e MODEL_ENDPOINT=http://10.88.0.1:8001 \\\n-e VECTORDB_HOST=10.88.0.1 \\\n-v Local/path/to/locallm/models/:/rag/models \\\nrag   \n```\n\n### Interact with the AI Application\n\nEverything should now be up an running with the rag application available at [`http://localhost:8501`](http://localhost:8501). By using this recipe and getting this starting point established, users should now have an easier time customizing and building their own LLM enabled RAG applications.   \n\n### Embed the AI Application in a Bootable Container Image\n\nTo build a bootable container image that includes this sample RAG chatbot workload as a service that starts when a system is booted, cd into this folder\nand run:\n\n\n```\nmake BOOTC_IMAGE=quay.io/your/rag-bootc:latest bootc\n```\n\nSubstituting the bootc/Containerfile FROM command is simple using the Makefile FROM option.\n\n```\nmake FROM=registry.redhat.io/rhel9/rhel-bootc:9.4 BOOTC_IMAGE=quay.io/your/rag-bootc:latest bootc\n```\n\nThe magic happens when you have a bootc enabled system running. If you do, and you'd like to update the operating system to the OS you just built\nwith the RAG chatbot application, it's as simple as ssh-ing into the bootc system and running:\n\n```\nbootc switch quay.io/your/rag-bootc:latest\n```\n\nUpon a reboot, you'll see that the RAG chatbot service is running on the system.\n\nCheck on the service with\n\n```\nssh user@bootc-system-ip\nsudo systemctl status rag\n```\n\n#### What are bootable containers?\n\nWhat's a [bootable OCI container](https://containers.github.io/bootc/) and what's it got to do with AI?\n\nThat's a good question! We think it's a good idea to embed AI workloads (or any workload!) into bootable images at _build time_ rather than\nat _runtime_. This extends the benefits, such as portability and predictability, that containerizing applications provides to the operating system.\nBootable OCI images bake exactly what you need to run your workloads into the operating system at build time by using your favorite containerization\ntools. Might I suggest [podman](https://podman.io/)?\n\nOnce installed, a bootc enabled system can be updated by providing an updated bootable OCI image from any OCI\nimage registry with a single `bootc` command. This works especially well for fleets of devices that have fixed workloads - think\nfactories or appliances. Who doesn't want to add a little AI to their appliance, am I right?\n\nBootable images lend toward immutable operating systems, and the more immutable an operating system is, the less that can go wrong at runtime!\n\n##### Creating bootable disk images\n\nYou can convert a bootc image to a bootable disk image using the\n[quay.io/centos-bootc/bootc-image-builder](https://github.com/osbuild/bootc-image-builder) container image.\n\nThis container image allows you to build and deploy [multiple disk image types](../../common/README_bootc_image_builder.md) from bootc container images.\n\nDefault image types can be set via the DISK_TYPE Makefile variable.\n\n`make bootc-image-builder DISK_TYPE=ami`\n\n### Makefile variables\n\nThere are several [Makefile variables](../../common/README.md) defined within each `recipe` Makefile which can be\nused to override defaults for a variety of make targets.\n",
-      "recommended": ["hf.instructlab.granite-7b-lab-GGUF", "hf.instructlab.merlinite-7b-lab-GGUF"],
+      "recommended": [
+        "hf.instructlab.granite-7b-lab-GGUF",
+        "hf.instructlab.merlinite-7b-lab-GGUF",
+        "hf.lmstudio-community.granite-3.0-8b-instruct-GGUF"
+      ],
       "backend": "llama-cpp"
     },
     {