Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM #839

Open
QingChengLineOne opened this issue Dec 17, 2024 · 1 comment
Labels
BFCL-General General BFCL Issue

Comments

@QingChengLineOne
Copy link

Describe the issue
我已经将Qwen/Qwen2.5-7B-Instruct下载到本地并放到了gorilla/berkeley-function-call-leaderboard/Qwen/Qwen2.5-7B-Instruct这个目录
执行命令:CUDA_VISIBLE_DEVICES=0,1 bfcl generate --model Qwen/Qwen2.5-7B-Instruct --backend vllm --num-gpus 2 --gpu-memory-utilization 0.9
模型部署成功并能curl到:
INFO 12-17 14:33:26 model_runner.py:1530] Graph capturing finished in 17 secs.
INFO 12-17 14:33:27 api_server.py:232] vLLM to use /tmp/tmp5f9y67qb as PROMETHEUS_MULTIPROC_DIR
WARNING 12-17 14:33:27 serving_embedding.py:199] embedding_mode is False. Embedding API will not work.
INFO 12-17 14:33:27 launcher.py:19] Available routes are:
INFO 12-17 14:33:27 launcher.py:27] Route: /openapi.json, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /docs, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /redoc, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /health, Methods: GET
INFO 12-17 14:33:27 launcher.py:27] Route: /tokenize, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /detokenize, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/models, Methods: GET
INFO 12-17 14:33:27 launcher.py:27] Route: /version, Methods: GET
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/chat/completions, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/completions, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/embeddings, Methods: POST
INFO: Started server process [54494]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on socket ('0.0.0.0', 1053) (Press CTRL+C to quit)

但是推理报错:
Max context length: 32768
❗️❗️ Error occurred during inference for test case exec_parallel_37
Error type: InternalServerError
Error message: Internal Server Error
Traceback:
Traceback (most recent call last):
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 239, in _multi_threaded_inference
model_responses, metadata = self.inference_single_turn_prompting(test_case, include_input_log)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/base_handler.py", line 579, in inference_single_turn_prompting
api_response, query_latency = self._query_prompting(inference_data)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 308, in _query_prompting
api_response = self.client.completions.create(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
return func(*args, **kwargs)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/resources/completions.py", line 539, in create
return self._post(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error

Traceback (most recent call last):
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
yield
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 236, in handle_request
resp = self._pool.handle_request(req)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 231, in _receive_event
raise RemoteProtocolError(msg)
httpcore.RemoteProtocolError: Server disconnected without sending a response.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
response = self._client.send(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 926, in send
response = self._send_handling_auth(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 954, in _send_handling_auth
response = self._send_handling_redirects(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 991, in _send_handling_redirects
response = self._send_single_request(request)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 1027, in _send_single_request
response = transport.handle_request(request)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 235, in handle_request
with map_httpcore_exceptions():
File "/root/anaconda/envs/BFCL/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.RemoteProtocolError: Server disconnected without sending a response.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 239, in _multi_threaded_inference
model_responses, metadata = self.inference_single_turn_prompting(test_case, include_input_log)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/base_handler.py", line 579, in inference_single_turn_prompting
api_response, query_latency = self._query_prompting(inference_data)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 308, in _query_prompting
api_response = self.client.completions.create(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
return func(*args, **kwargs)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/resources/completions.py", line 539, in create
return self._post(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1007, in _request
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

@HuanzhiMao HuanzhiMao changed the title [BFCL] [BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM Dec 19, 2024
@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Dec 19, 2024
@HuanzhiMao
Copy link
Collaborator

Hey @QingChengLineOne ,
Thanks for the issue.
Could you try directly spinning up the vllm server in your terminal and see if that work? (i.e., in the terminal, vllm serve Qwen/Qwen2.5-7B-Instruct --port 1053 --dtype bfloat16 --tensor-parallel-size 2 --gpu-memory-utilization 0.9)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

No branches or pull requests

2 participants