This folder contains a demonstration of a chat web-app that uses Ollama as an LLM chat service. The following instructions describe how to set up a local Ollama service with which the web-app demo can interact, send queries, and show the chat service's responses.
First, set up the Ollama service. Follow the directions found in the Ollama Github project. Follow the instructions given in Ollama's Model Library to install and run a language model.
Once the Ollama Service is installed, launched and running, proceed to the next steps.
There are two ways to make requests of the Ollama service. The demo uses the
browsers' built-in fetch()
function. The other method is to use the Ollama
client library. These are discussed in the next two sections.
The whatExpress.html
web-app uses fetch()
to send queries to the Ollama
service. The advantage is that no other support library or code needs to be
installed. The disadvantages are that setting up the parameters for fetch()
and managing the response are a little more involved. In terms of the request,
it is necessary to:
- provide the request headers,
- specify that it is a
POST
request, - make the request of the correct Ollama service endpoint,
- set the
Content-Type
, and - properly set up the body payload.
For the chat end-point, the response is a special type of JSON, namely json-nd
that must be parsed and processed to retrieve the actual AI's text to show on
the web page. Using a client library, briefly discussed in the next section,
hides a lot of these details when interacting with Ollama.
The demo web-app is launched using the following command:
npm run serveAppsDemos
Once the development server is running, open this localhost
url from within a
browser:
http://localhost:5173/demos/Ollama%20Chat%20Service/whatExpress.html
There is an Ollama browser API
for communication with the Ollama service, using the library's chat()
function
instead of making raw requests using fetch()
. The advantage of this
approach is that sending a query, or different types of queries, is more
straightforward that involves passing an object that is essentially the body
of the request. The response from chat()
in this case is an array of strings
that can be concatenated to show the AI's text on the web page. Alternatively,
the library provides a streaming response such that as the LLM provides further
output, it can be taken and added incrementally to the display instead of
waiting for the LLM to finish and displaying the response all at once.
The disadvantages are that Ollama's browser library must be
installed, and the necessary objects and functions must be included using
import
statements in the app's JavaScript code.
Using the library is exemplified by the "Using Ollama with LLMs Running Locally"
web app in the apps/ollama
folder. See its
README for further instructions.
Near the top of the demo is a pull-down menu for selecting the language model to use for the chat. If no action is taken, the language model shown will be the one that is used. If no language models are shown in the pull-down, follow Ollama's Model Library instructions to install language models.
There are two ways to chat with Ollama. All text added to the text field of the chat web-app client is prefixed with the question "What does this express:" That is, if the text box contains:
Horse brown eat quickly oats dried
then the full prompt sent to the LLM is:
What does this express: "Horse brown eat quickly oats dried"?
Pressing the "Ask" button will send the query exactly as shown above. The reply will likely be somewhat wordy. Ollama will offer an analysis of the chat query, how the query might be improved, and it will end with a response.
If the "Answer with a single grammatically correct sentence" button is pressed, then the query sent to the LLM service is modified in an attempt to have the LLM return a single sentence, like so:
What does this express: "Horse brown eat quickly oats dried"? Answer
with a single grammatically correct sentence.
Ollama will likely respond with a single sentence for this query.