504 Gateway Timeout - The server didn't respond in time #314

devilteo911 · 2024-11-07T22:12:28Z

I don't know why but I'm encountering this problem with the library. Here I show my simple script:

import ollama

client = ollama.Client(host=llm_config["base_url"], timeout=600)
client.chat(model=config["ollama"]["model"], messages=[{
    "role":"user",
    "content":"Why is the sky blue?"
}])

Where llm_config["base_url"] is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is: qwen2.5:32b-instruct-q4_K_M and the GPU is a RTX A6000.

The traceback (client-side) is the following:

Traceback (most recent call last):
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 236, in chat
    return self._request_stream(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 99, in _request_stream
    return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 75, in _request
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

and this is what I see on the server side:

[GIN] 2024/11/07 - 22:04:21 | 500 | 50.001124922s |    xx.xx.xx.xx | POST     "/api/chat"

It happens everytime after 50 seconds even if the timeout is 600 seconds. Am I missing something?

The text was updated successfully, but these errors were encountered:

MatteoSid · 2024-11-07T22:42:14Z

I have the same issue

ParthSareen · 2024-11-15T05:08:55Z

Hey @devilteo911 - have you tried not setting a timeout and seeing if there's an issue on the server side regardless? Trying to narrow down if some information is not passing all the way through to the server or if there is an error on the server side.

Thanks!

devilteo911 · 2024-11-19T13:42:49Z

Hey @ParthSareen,

The issue seems to occur only on the first call, which consistently results in a 504 error. Subsequent calls with the same input perform the generation without any problems.

I believe the problem is related to the time it takes to generate the first token, particularly during a cold start of my service. During a cold start, the model needs to be downloaded from Hugging Face, as my serverless GPU provider lacks permanent storage to keep the model locally.

I hope this clarifies the issue.

ParthSareen self-assigned this Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

504 Gateway Timeout - The server didn't respond in time #314

504 Gateway Timeout - The server didn't respond in time #314

devilteo911 commented Nov 7, 2024 •

edited

Loading

MatteoSid commented Nov 7, 2024

ParthSareen commented Nov 15, 2024

devilteo911 commented Nov 19, 2024

504 Gateway Timeout - The server didn't respond in time #314

504 Gateway Timeout - The server didn't respond in time #314

Comments

devilteo911 commented Nov 7, 2024 • edited Loading

MatteoSid commented Nov 7, 2024

ParthSareen commented Nov 15, 2024

devilteo911 commented Nov 19, 2024

devilteo911 commented Nov 7, 2024 •

edited

Loading