模型运行的 CUDA error: device-side assert相关问题 #393
Replies: 37 comments 35 replies
-
我也出现了同样的问题 |
Beta Was this translation helpful? Give feedback.
-
我也出现了同样的问题 |
Beta Was this translation helpful? Give feedback.
-
glm官方,能快速解决这个问题不??? |
Beta Was this translation helpful? Give feedback.
-
是这样的,也是遇到这个问题 |
Beta Was this translation helpful? Give feedback.
-
我也遇到这个问题 怎么解决哇 |
Beta Was this translation helpful? Give feedback.
-
同样遇到了,求解 |
Beta Was this translation helpful? Give feedback.
-
mark,我们抓紧解决这个问题 |
Beta Was this translation helpful? Give feedback.
-
目前我暂时没能复现这个问题。这里 CUDA 的报错是异步的,是否有人可以在设置了 |
Beta Was this translation helpful? Give feedback.
-
我在一种情况下复现了这种问题。一种可能的原因是输入序列的长度超过了模型的 position embedding 的最大长度,造成索引时超范围了。 在 |
Beta Was this translation helpful? Give feedback.
-
我也出现了这个问题,出现了很多次,多轮对话后出现,显存也没有爆 |
Beta Was this translation helpful? Give feedback.
-
一样的问题,用model.stream_chat流输出字符串没问题,用model.chat整段字符串输出就报错,断点到源码看,看起来像是数组越界 |
Beta Was this translation helpful? Give feedback.
-
看我错误报告 #243 |
Beta Was this translation helpful? Give feedback.
-
看我错误报告 #243 |
Beta Was this translation helpful? Give feedback.
-
今天更新了一下模型配置文件,查看一下问题是否还在 |
Beta Was this translation helpful? Give feedback.
-
我也会出现这样的情况,特别是在每一次进行对话显存的占用都会上升,但结束这轮对话后,显存会降到原来刚加载模型时侯的显存占用,如果再进行对话则显存会比上一次对话时候的显存占用还要大 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
有没有chatGLM2-6B也遇到同样问题的 |
Beta Was this translation helpful? Give feedback.
-
你好,我在循环调用stream_chat时 也出现了同样的问题。在到了大约114条输入时,报出提示: 我尝试按照给的提示修改
|
Beta Was this translation helpful? Give feedback.
-
same error. :( 刚开始尝试使用就遇到了。 环境:我在ubuntu 22.04 的 docker 内 (cuda118, torch210),显卡4090*1 ,没有修改过模型。 另外并不是超多轮问答才会出现,我三四轮就会遇到。贴一个聊天记录吧,就:(我之前还遇到过更短的对话,从第三轮就开始出现问题的,没记下来。)
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
我也遇到了这个问题,希望官方能够快点解决这个问题 |
Beta Was this translation helpful? Give feedback.
-
我直接在开源项目中执行这个文件:https://github.com/THUDM/ChatGLM3/blob/main/finetune_demo/lora_finetune.ipynb,也遇到了同样的报错。部分信息如下:/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/nn/modules/module.py:1511 in │ |
Beta Was this translation helpful? Give feedback.
-
python inference_hf.py your_finetune_path --prompt your prompt 执行这个命令的时候,也会报这个 |
Beta Was this translation helpful? Give feedback.
-
用chatglm3官方代码进行lora微调,微调能成功,但是用官方的lora_finetune.ipynb 中的方式: “python3 inference_hf.py 微调后的模型输出目录 --prompt 输入prompt” 进行推理,就报上述错误了。 |
Beta Was this translation helpful? Give feedback.
-
听有的同学说修改peft版本为0.9这个版本没有这个问题,但是要重新微调! |
Beta Was this translation helpful? Give feedback.
-
是参数history的问题,history初始值可以设为None,但千万别设为[] |
Beta Was this translation helpful? Give feedback.
-
GLM4也是一样的,我用llama factory微调,但是llama factory只能peft>0.11 |
Beta Was this translation helpful? Give feedback.
-
模型部署后调用了几百次没问题 但再调用就报了这个错误
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/applications.py", line 276, in call
await super().call(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
raw_response = await run_endpoint_function(
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "get_api_cuda1.py", line 66, in create_item
response, history = model.chat(tokenizer,
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1032, in chat
inputs = inputs.to(self.device)
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 758, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/usr/local/anaconda3/envs/chatglm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 758, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Beta Was this translation helpful? Give feedback.
All reactions