-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fdy/support hf lora amp #308
Conversation
fandaoyi
commented
Sep 22, 2023
•
edited
Loading
edited
- 重写了部分 torch/csrc/device 的逻辑,dipu 可以在python层显示为 ‘cuda’ (包括 device, print tensor, generator, storage 都会显示 ‘device=cuda’)。 并fix了部分受影响的ci 和其他部分(主要是storage) (camb 的test 受影响比较多,主要是目前test 的设备还是 ’dipu‘导致的, 另外 改为cuda 会多跑一些测试。 后续计划先使用 DIPU_PYTHON_DEVICE_AS_CUDA= false 暂时绕过。 等凌劼的 ci梳理后一起处理。)
- 新增了一个弱符号接口 getDeviceStatus() 用于获取 mem_info 等设备实时信息, 另外对原来的 getDevicePropertiesFromCache 和新接口做了一定程度的统一。
- 删除了部分多余的 generator 逻辑(已经和蔡坤同步过)。
Conflicts: dipu/torch_dipu/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个这两天会合吗,我最近会把 test_op 整个目录改掉
return THPUtils_packString(oss.str().c_str()); | ||
} | ||
|
||
static struct PyGetSetDef DIPU_THPDevice_properties[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该不能用 const, 后面要赋值给 tp_getset, 那个不是 const的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这居然是个数组- -看起来更可怕了...
好像也没啥好改法,要不然加个注释说这玩意儿传给 pytorch 随他用去了
using dipu::devapis::DIPUDeviceProperties; | ||
using c10::DeviceIndex; | ||
|
||
DeviceIndex num_gpus = -1; | ||
c10::once_flag init_flag; | ||
std::deque<c10::once_flag> device_flags; | ||
std::vector<DIPUDeviceProperties> device_properties; | ||
std::vector<shared_ptr<DIPUDeviceProperties>> device_properties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里相关的修改的目的是什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原来的逻辑会把裸指针传到 python层。用 shared_ptr pybind11 可以和cpp层共同管理 生命周期。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所以这个对象在 python 层可能会活得比 cpp 层久嘛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该不会。但是之前传一个裸指针过去, 还要改变 pybind 默认的析构行为 py::return_value_policy::reference,还是有些奇怪的。也不太符合 pybind 推荐的模式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
seems remaining comments are enhanced suggestions, not mandatory。 they shouldn’t be ‘request’ but ‘comments’