Skip to content

Commit

Permalink
Merge pull request #229 from MichaelClifford/docs
Browse files Browse the repository at this point in the history
added nvidia gpu instructions
  • Loading branch information
rhatdan authored Apr 10, 2024
2 parents 5517605 + 3f23966 commit a6b837c
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions model_servers/llamacpp_python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ To pull the base model service image:
podman pull quay.io/ai-lab/llamacpp-python-cuda
```

**IMPORTANT!**

To run the Cuda image with GPU acceleration, you need to install the correct [Cuda drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation) for your system along with the [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#). Please use the links provided to find installation instructions for your system.

Once those are installed you can use the container toolkit CLI to discover your Nvidia device(s).
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

Finally, you will also need to add `--device nvidia.com/gpu=all` to your `podman run` command so your container can access the GPU.


### Vulkan (experimental)

The [Vulkan image](../llamacpp_python/vulkan/Containerfile) is experimental, but can be used for gaining partial GPU access on an M-series Mac, significantly speeding up model response time over a CPU only deployment. This image requires that your podman machine provider is "applehv" and that you use krunkit instead of vfkit. Since these tools are not currently supported by podman desktop this image will remain "experimental".
Expand Down Expand Up @@ -100,6 +112,18 @@ podman run --rm -it \
llamacpp_python \
```

or with Cuda image

```bash
podman run --rm -it \
--device nvidia.com/gpu=all
-p 8001:8001 \
-v Local/path/to/locallm/models:/locallm/models:ro \
-e MODEL_PATH=models/mistral-7b-instruct-v0.1.Q4_K_M.gguf
-e HOST=0.0.0.0
-e PORT=8001
llamacpp_python \
```
### Multiple Model Service:

To enable dynamic loading and unloading of different models present on your machine, you can start the model service with a `CONFIG_PATH` instead of a `MODEL_PATH`.
Expand Down

0 comments on commit a6b837c

Please sign in to comment.