-
Notifications
You must be signed in to change notification settings - Fork 115
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
10bc46e
commit 01ba04a
Showing
2 changed files
with
68 additions
and
17 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,35 +1,86 @@ | ||
# Code Generation | ||
# Code Generation Application | ||
|
||
This example will deploy a local code-gen application using a llama.cpp model server and a python app built with langchain. | ||
This demo provides a simple recipe to help developers start building out their own custom LLM enabled code generation applications. It consists of two main components; the Model Service and the AI Application. | ||
|
||
### Download Model | ||
There are a few options today for local Model Serving, but this recipe will use [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, [`playground/Containerfile`](/playground/Containerfile). | ||
|
||
- **codellama** | ||
- Download URL: `wget https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf` | ||
Our AI Application will connect to our Model Service via it's OpenAI compatible API. In this example we rely on [Langchain's](https://python.langchain.com/docs/get_started/introduction) python package to simplify communication with our Model Service and we use [Streamlit](https://streamlit.io/) for our UI layer. Below please see an example of the code generation application. | ||
|
||
``` | ||
cd ../models | ||
wget <Download URL> | ||
|
||
![](/assets/codegen_ui.png) | ||
|
||
|
||
# Build the Application | ||
|
||
In order to build this application we will need a model, a Model Service and an AI Application. | ||
|
||
* [Download a model](#download-a-model) | ||
* [Build the Model Service](#build-the-model-service) | ||
* [Deploy the Model Service](#deploy-the-model-service) | ||
* [Build the AI Application](#build-the-ai-application) | ||
* [Deploy the AI Application](#deploy-the-ai-application) | ||
* [Interact with the AI Application](#interact-with-the-ai-application) | ||
|
||
### Download a model | ||
|
||
If you are just getting started, we recommend using a [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) based model that's been fine-tuned for code generation. Mistral-7B is a well performant mid-sized model with an apache-2.0 license. In order to use it with our Model Service we need it converted and quantized into the [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md). There are a number of ways to get a GGUF version of Mistral-7B based models, but the simplest is to download a pre-converted one from [huggingface.co](https://huggingface.co). For this demo we recommend using https://huggingface.co/TheBloke/Mistral-7B-Code-16K-qlora-GGUF. | ||
|
||
There are a number of options for quantization level, but we recommend `Q4_K_M`. | ||
|
||
The recommended model can be downloaded using the code snippet below: | ||
|
||
```bash | ||
cd models | ||
wget https://huggingface.co/TheBloke/Mistral-7B-Code-16K-qlora-GGUF/resolve/main/mistral-7b-code-16k-qlora.Q4_K_M.gguf | ||
cd ../ | ||
``` | ||
|
||
### Deploy Model Service | ||
_A full list of supported open models is forthcoming._ | ||
|
||
|
||
### Build the Model Service | ||
|
||
To start the model service, refer to [the playground model-service document](../playground/README.md). Deploy the LLM server and volumn mount the model of choice. | ||
The complete instructions for building and deploying the Model Service can be found in the [the playground model-service document](../playground/README.md). | ||
|
||
The Model Service can be built from the root directory with the following code snippet: | ||
|
||
```bash | ||
podman build -t llamacppserver playground/ | ||
``` | ||
|
||
|
||
### Deploy the Model Service | ||
|
||
The complete instructions for building and deploying the Model Service can be found in the [the playground model-service document](../playground/README.md). | ||
|
||
The local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following podman command: | ||
``` | ||
podman run --rm -it -d \ | ||
podman run --rm -it \ | ||
-p 8001:8001 \ | ||
-v Local/path/to/locallm/models:/locallm/models:ro,Z \ | ||
-v Local/path/to/locallm/models:/locallm/models \ | ||
-e MODEL_PATH=models/<model-filename> \ | ||
-e HOST=0.0.0.0 \ | ||
-e PORT=8001 \ | ||
playground:image | ||
llamacppserver | ||
``` | ||
|
||
### Build the AI Application | ||
|
||
Now that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application image from the `code-generation/` directory. | ||
```bash | ||
cd code-generation | ||
podman build -t codegen . -f builds/Containerfile | ||
``` | ||
### Deploy the AI Application | ||
|
||
Make sure the Model Service is up and running before starting this container image. When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`. This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. In our case the Model Service is running inside the podman machine so we need to provide it with the appropriate address `10.88.0.1`. The following podman command can be used to run your AI Application: | ||
|
||
```bash | ||
podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://10.88.0.1:8001/v1 codegen | ||
``` | ||
|
||
### Build Container Image | ||
### Interact with the AI Application | ||
|
||
Once the model service is deployed, then follow the instruction below to build your container image and run it locally. | ||
Everything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501). By using this recipe and getting this starting point established, users should now have an easier time customizing and building their own LLM enabled code generation applications. | ||
|
||
- `podman build -t codegen-app code-generation -f code-generation/builds/Containerfile` | ||
- `podman run -it -p 8501:8501 codegen-app -- -m http://10.88.0.1:8001/v1` | ||
_Note: Future recipes will demonstrate integration between locally hosted LLM's and developer productivity tools like VSCode._ |