Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

504 Gateway Timeout - The server didn't respond in time #314

Open
devilteo911 opened this issue Nov 7, 2024 · 3 comments
Open

504 Gateway Timeout - The server didn't respond in time #314

devilteo911 opened this issue Nov 7, 2024 · 3 comments
Assignees

Comments

@devilteo911
Copy link

devilteo911 commented Nov 7, 2024

I don't know why but I'm encountering this problem with the library. Here I show my simple script:

import ollama

client = ollama.Client(host=llm_config["base_url"], timeout=600)
client.chat(model=config["ollama"]["model"], messages=[{
    "role":"user",
    "content":"Why is the sky blue?"
}])

Where llm_config["base_url"] is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is: qwen2.5:32b-instruct-q4_K_M and the GPU is a RTX A6000.

The traceback (client-side) is the following:

Traceback (most recent call last):
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 236, in chat
    return self._request_stream(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 99, in _request_stream
    return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 75, in _request
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

and this is what I see on the server side:

[GIN] 2024/11/07 - 22:04:21 | 500 | 50.001124922s |    xx.xx.xx.xx | POST     "/api/chat"

It happens everytime after 50 seconds even if the timeout is 600 seconds. Am I missing something?

@MatteoSid
Copy link

I have the same issue

@ParthSareen
Copy link
Contributor

Hey @devilteo911 - have you tried not setting a timeout and seeing if there's an issue on the server side regardless? Trying to narrow down if some information is not passing all the way through to the server or if there is an error on the server side.

Thanks!

@ParthSareen ParthSareen self-assigned this Nov 15, 2024
@devilteo911
Copy link
Author

Hey @ParthSareen,

The issue seems to occur only on the first call, which consistently results in a 504 error. Subsequent calls with the same input perform the generation without any problems.

I believe the problem is related to the time it takes to generate the first token, particularly during a cold start of my service. During a cold start, the model needs to be downloaded from Hugging Face, as my serverless GPU provider lacks permanent storage to keep the model locally.

I hope this clarifies the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants