[BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM #839

QingChengLineOne · 2024-12-17T14:34:11Z

Describe the issue
我已经将Qwen/Qwen2.5-7B-Instruct下载到本地并放到了gorilla/berkeley-function-call-leaderboard/Qwen/Qwen2.5-7B-Instruct这个目录
执行命令：CUDA_VISIBLE_DEVICES=0,1 bfcl generate --model Qwen/Qwen2.5-7B-Instruct --backend vllm --num-gpus 2 --gpu-memory-utilization 0.9
模型部署成功并能curl到：
INFO 12-17 14:33:26 model_runner.py:1530] Graph capturing finished in 17 secs.
INFO 12-17 14:33:27 api_server.py:232] vLLM to use /tmp/tmp5f9y67qb as PROMETHEUS_MULTIPROC_DIR
WARNING 12-17 14:33:27 serving_embedding.py:199] embedding_mode is False. Embedding API will not work.
INFO 12-17 14:33:27 launcher.py:19] Available routes are:
INFO 12-17 14:33:27 launcher.py:27] Route: /openapi.json, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /docs, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /redoc, Methods: HEAD, GET
INFO 12-17 14:33:27 launcher.py:27] Route: /health, Methods: GET
INFO 12-17 14:33:27 launcher.py:27] Route: /tokenize, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /detokenize, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/models, Methods: GET
INFO 12-17 14:33:27 launcher.py:27] Route: /version, Methods: GET
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/chat/completions, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/completions, Methods: POST
INFO 12-17 14:33:27 launcher.py:27] Route: /v1/embeddings, Methods: POST
INFO: Started server process [54494]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on socket ('0.0.0.0', 1053) (Press CTRL+C to quit)

但是推理报错：
Max context length: 32768
❗️❗️ Error occurred during inference for test case exec_parallel_37
Error type: InternalServerError
Error message: Internal Server Error
Traceback:
Traceback (most recent call last):
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 239, in _multi_threaded_inference
model_responses, metadata = self.inference_single_turn_prompting(test_case, include_input_log)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/base_handler.py", line 579, in inference_single_turn_prompting
api_response, query_latency = self._query_prompting(inference_data)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 308, in _query_prompting
api_response = self.client.completions.create(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
return func(*args, **kwargs)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/resources/completions.py", line 539, in create
return self._post(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error

Traceback (most recent call last):
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
yield
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 236, in handle_request
resp = self._pool.handle_request(req)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request
raise exc from None
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request
response = connection.handle_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 103, in handle_request
return self._connection.handle_request(request)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 136, in handle_request
raise exc
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 106, in handle_request
) = self._receive_response_headers(**kwargs)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 177, in _receive_response_headers
event = self._receive_event(timeout=timeout)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 231, in _receive_event
raise RemoteProtocolError(msg)
httpcore.RemoteProtocolError: Server disconnected without sending a response.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
response = self._client.send(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 926, in send
response = self._send_handling_auth(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 954, in _send_handling_auth
response = self._send_handling_redirects(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 991, in _send_handling_redirects
response = self._send_single_request(request)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_client.py", line 1027, in _send_single_request
response = transport.handle_request(request)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 235, in handle_request
with map_httpcore_exceptions():
File "/root/anaconda/envs/BFCL/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.RemoteProtocolError: Server disconnected without sending a response.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 239, in _multi_threaded_inference
model_responses, metadata = self.inference_single_turn_prompting(test_case, include_input_log)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/base_handler.py", line 579, in inference_single_turn_prompting
api_response, query_latency = self._query_prompting(inference_data)
File "/public/zzy/tool_project/gorilla/berkeley-function-call-leaderboard/bfcl/model_handler/oss_model/base_oss_handler.py", line 308, in _query_prompting
api_response = self.client.completions.create(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
return func(*args, **kwargs)
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/resources/completions.py", line 539, in create
return self._post(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1075, in _retry_request
return self._request(
File "/root/anaconda/envs/BFCL/lib/python3.10/site-packages/openai/_base_client.py", line 1007, in _request
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

HuanzhiMao · 2024-12-19T19:33:18Z

Hey @QingChengLineOne ,
Thanks for the issue.
Could you try directly spinning up the vllm server in your terminal and see if that work? (i.e., in the terminal, vllm serve Qwen/Qwen2.5-7B-Instruct --port 1053 --dtype bfloat16 --tensor-parallel-size 2 --gpu-memory-utilization 0.9)

HuanzhiMao changed the title ~~[BFCL]~~ [BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM Dec 19, 2024

HuanzhiMao added the BFCL-General General BFCL Issue label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM #839

[BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM #839

QingChengLineOne commented Dec 17, 2024

HuanzhiMao commented Dec 19, 2024

[BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM #839

[BFCL] "Internal Server Error" when running Qwen2.5-7B-Instruct using vLLM #839

Comments

QingChengLineOne commented Dec 17, 2024

HuanzhiMao commented Dec 19, 2024