The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.
- Ollama should be installed and running
- Pull a model to use with the library:
ollama pull <model>
e.g.ollama pull llama3.2
- See Ollama.com for more information on the models available.
pip install ollama
from ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(model='llama3.2', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)
See _types.py for more information on the response types.
Response streaming can be enabled by setting stream=True
.
Note
Streaming Tool/Function calling is not yet supported.
from ollama import chat
stream = chat(
model='llama3.2',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
A custom client can be created by instantiating Client
or AsyncClient
from ollama
.
All extra keyword arguments are passed into the httpx.Client
.
from ollama import Client
client = Client(
host='http://localhost:11434',
headers={'x-some-header': 'some-value'}
)
response = client.chat(model='llama3.2', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
The AsyncClient
class is used to make asynchronous requests. It can be configured with the same fields as the Client
class.
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='llama3.2', messages=[message])
asyncio.run(chat())
Setting stream=True
modifies functions to return a Python asynchronous generator:
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='llama3.2', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())
The Ollama Python library's API is designed around the Ollama REST API
ollama.chat(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
ollama.generate(model='llama3.2', prompt='Why is the sky blue?')
ollama.list()
ollama.show('llama3.2')
modelfile='''
FROM llama3.2
SYSTEM You are mario from super mario bros.
'''
ollama.create(model='example', modelfile=modelfile)
ollama.copy('llama3.2', 'user/llama3.2')
ollama.delete('llama3.2')
ollama.pull('llama3.2')
ollama.push('user/llama3.2')
ollama.embed(model='llama3.2', input='The sky is blue because of rayleigh scattering')
ollama.embed(model='llama3.2', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
ollama.ps()
Errors are raised if requests return an error status or if an error is detected while streaming.
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('Error:', e.error)
if e.status_code == 404:
ollama.pull(model)