Skip to content

Commit

Permalink
Merge pull request containers#423 from MichaelClifford/v1.1.0_convert
Browse files Browse the repository at this point in the history
Small text updates to converter readme
  • Loading branch information
rhatdan authored May 1, 2024
2 parents e1ba968 + 1452a5d commit 3019ebb
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions convert_models/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Convert and Quantize Models

Locallm currently relies on [llamacpp](https://github.com/ggerganov/llama.cpp) for its model service backend. Llamacpp requires that model be in a `*.gguf` format.
AI Lab Recipes' default model server is [llamacpp_python](https://github.com/abetlen/llama-cpp-python), which needs models to be in a `*.GGUF` format.

However, most models available on [huggingface](https://huggingface.co/models) are not provided directly as `*.gguf` files. More often they are provided as a set of `*.bin` files with some additional metadata files that are produced when the model is originally trained.
However, most models available on [huggingface](https://huggingface.co/models) are not provided directly as `*.GGUF` files. More often they are provided as a set of `*.bin` or `*.safetensor` files with some additional metadata produced when the model is trained.

There are of course a number of users on huggingface who provide `*gguf` versions of popular models. But this introduces an unnecessary interim dependency as well as possible security or licensing concerns.
There are of course a number of users on huggingface who provide `*.GGUF` versions of popular models. But this introduces an unnecessary interim dependency as well as possible security or licensing concerns.

To avoid these concerns and provide users with the maximum freedom of choice for their models, we provide a tool to quickly and easily convert and quantize a model on huggingface into a `*gguf` format for use with Locallm.
To avoid these concerns and provide users with the maximum freedom of choice for their models, we provide a tool to quickly and easily convert and quantize a model from huggingface into a `*.GGUF` format for use with our `*.GGUF` compatible model servers.

![](/assets/model_converter.png)

Expand All @@ -19,7 +19,7 @@ podman build -t converter .

## Quantize and Convert

You can run the conversion image directly with Podman in the terminal. You just need to provide it with the huggingface model you want to download, the quantization level you want to use and whether or not you want to keep the raw files after conversion.
You can run the conversion image directly with podman in the terminal. You just need to provide it with the huggingface model name you want to download, the quantization level you want to use and whether or not you want to keep the raw files after conversion.

```bash
podman run -it --rm -v models:/converter/converted_models -e HF_MODEL_URL=<ORG/MODEL_NAME> -e QUANTIZATION=Q4_K_M -e KEEP_ORIGINAL_MODEL="False"
Expand All @@ -33,12 +33,12 @@ streamlit run convert_models/ui.py

## Model Storage and Use

This process writes the models into a Podman volume under a `gguf/` directory and not directly back to the user's host machine (This could be changed in an upcoming update if it is required).
This process writes the models into a podman volume under a `gguf/` directory and not directly back to the user's host machine (This could be changed in an upcoming update if it is required).

If a user wants to access these models to use with the llamacpp model-service, they would simply point their model-service volume mount to the Podman volume created here. For example:
If a user wants to access these models to use with the llamacpp_python model server, they would simply point their model service to the correct podman volume at run time. For example:

```
podman run -it -p 8001:8001 -v models:/locallm/models:Z -e MODEL_PATH=models/gguf/<MODEL_NAME> -e HOST=0.0.0.0 -e PORT=8001 llamacppserver
```bash
podman run -it -p 8001:8001 -v models:/opt/app-root/src/converter/converted_models/gguf:Z -e MODEL_PATH=/gguf/<MODEL_NAME> -e HOST=0.0.0.0 -e PORT=8001 llamacpp_python
```


0 comments on commit 3019ebb

Please sign in to comment.