Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy up #197

Merged
merged 1 commit into from
Apr 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/source/applications/llama_cpp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ Llama.cpp

The main goal of `llama.cpp <https://github.com/ggerganov/llama.cpp>`_ is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

One of the extras included in ``llama.cpp`` is a fast, lightweight, pure C/C++ HTTP server front end. With the help of some extra ``ssh`` tunnels, this can allow you to easily interact with a LLM running on Viking through the web browser on your PC. Below is an example of this workflow.
One of the extras included in ``llama.cpp`` is a fast, lightweight, pure C/C++ HTTP server to act as a front end. With the help of some extra ``ssh`` tunnels, this can allow you to easily interact with a LLM running on Viking through the web browser on your PC. Below is an example of this workflow.

To begin with you'll need to download an LLM to Viking, for example `mistral-7b-instruct-v0.2.Q4_K_M.gguf <https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf>`_. You can download this on Viking with the ``wget`` command for example:
To begin with you'll need to download an LLM to Viking, for example `mistral-7b-instruct-v0.2.Q4_K_M.gguf <https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF>`_. You can download this on Viking with the ``curl`` command for example:

.. code-block:: console

Expand Down Expand Up @@ -35,7 +35,7 @@ To run the server, you need to pass it the path to the LLM you downloaded, along

$ server -m /path/to/mistral-7b-instruct-v0.2.Q4_K_M.gguf --n-gpu-layers 256 --ctx-size 2048

Here I've also set the ``n-gpu-layers`` option which allows offloading some layers to the GPU for computation. Generally results in increased performance, and the ``ctx-size`` is the size of the prompt context (the default is 512).
Here I've also set the ``n-gpu-layers`` option which allows offloading some layers to the GPU for computation. Generally results in increased performance, and the ``ctx-size`` is the size of the prompt context (the default is 512). More options are covered in the `README.md <https://github.com/ggerganov/llama.cpp/tree/master/examples/server>`_.

.. note::

Expand All @@ -49,7 +49,7 @@ Set up the ssh tunnels

The ``llama.cpp`` server is running on a GPU compute node on Viking, behind the login node. This means you can't immediately load up a web browser and connect to it. But with *two* ``ssh`` tunnels, you can.

1. To forward you local connection to a login node
1. To forward from your local PC to the login node
2. To forward from the login node to the GPU compute node, where the server is running

On your local PC
Expand Down Expand Up @@ -94,7 +94,7 @@ If everything is working, you should now be able to connect to the server from y

.. Note::

The above two ssh tunnel commands can be done in one single command however, it will have the effect to leaving one of the ssh tunnels running on the login node after you have logged out which you should really kill when you're finished. If you're familiar with killing processes on Linux, an example command which you would only run in a terminal on your PC (not on Viking) would be:
The above two ``ssh`` tunnel commands can be done in one single command however, it will have the effect to leaving one of the ``ssh`` tunnels running on the login node after you have logged out which you should really kill when you're finished. If you're familiar with killing processes on Linux, an example command which you would only run in a terminal on your local PC (not on Viking) would be:

.. code-block:: console

Expand Down
Loading