Fdy/enhance copy #430

fandaoyi · 2023-11-15T10:37:52Z

问题：
diopiCopy api 要求有多种能力， vendor 不一定能全部实现（比如遂原只实现了少部分能力），目前的逻辑是要么报错，要么全部回退到缓慢的基于cpu的copy （除了direct copy）。我们希望能使用 diopiCopy 提高性能，但是又要能更灵活的处理不能被支持的case。

把 autogen 的 dipu_copy 函数仅作为桥接 diopu的接口使用，行为抽离到新的dipuCopy 类里。目前 autogen 不具备注册关联到多个 diopi 接口的 ‘复杂行为’ 类的能力。所以目前的实现有点 tricky。
新的 dipuCopy 提供一组基本构建块（doXXX 系列）和一定的可定制性。 vendor 可以重写默认实现，使用/部分使用 diopiCopy 来处理设备上的copy操作，不能处理的case 可以回退到辅助的 cpu copy。另外其他copy （d2h 等）默认会自动处理，默认无需vendor 干预（也可以修改）。
. 将 d2h， h2d， between device 等copy 逻辑优先分派到设备上执行（如果 diopi copy 可用）以提高性能。
把原本由 TensorIterator 处理的一些 check 和判断逻辑抽离到我们自定义的 CopyParamInfo 类和辅助函数，不再使用沉重的 TensorIterator。

Conflicts: dipu/torch_dipu/csrc_dipu/aten/DIPUATenFunctions.h dipu/torch_dipu/csrc_dipu/aten/RegisterDIPU.cpp dipu/torch_dipu/csrc_dipu/aten/ops/CopyKernel.cpp dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctions.hpp dipu/torch_dipu/csrc_dipu/runtime/core/DIPUCopyInplace.cpp dipu/torch_dipu/csrc_dipu/runtime/core/DIPUCopyInplace.h dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.h dipu/torch_dipu/csrc_dipu/vendor/cuda/CUDACopyInplace.cpp dipu/torch_dipu/csrc_dipu/vendor/supa/copyinplace.cpp

dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml

dipu/torch_dipu/csrc_dipu/aten/DIPUATenFunctions.h

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.h

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp

… into fdy/enhance_copy

dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctions.hpp

dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctionsForCopy.cpp

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp

dipu/torch_dipu/csrc_dipu/vendor/supa/copyinplace.cpp

dipu/torch_dipu/csrc_dipu/vendor/camb/CambCopyInplace.cpp

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp

dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctionsForCopy.cpp

dipu/torch_dipu/csrc_dipu/vendor/camb/CambCopyInplace.cpp

dipu/tests/python/unittests/test_copy.py

lljbash

lgtm

* mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy

* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * Fdy/fix copy tidy (#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (#488) * [dicp][ascend] fix ascend mm/bmm on 910B (#482) * mock torch.cuda.XXXTensor (#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>

* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (DeepLink-org#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (DeepLink-org#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (DeepLink-org#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * Fdy/fix copy tidy (DeepLink-org#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (DeepLink-org#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (DeepLink-org#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (DeepLink-org#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (DeepLink-org#488) * [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482) * mock torch.cuda.XXXTensor (DeepLink-org#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (DeepLink-org#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (DeepLink-org#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (DeepLink-org#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (DeepLink-org#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (DeepLink-org#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (DeepLink-org#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (DeepLink-org#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (DeepLink-org#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (DeepLink-org#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (DeepLink-org#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (DeepLink-org#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (DeepLink-org#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>

* add kunlunxin backend * add kunlunxin device * update copy_ for kunlunxin * lcy/clang-tidy (#483) * fix namespace declaration format * update diopi_functions.yaml * update clang-tidy * update clang-tidy * change tab into spaces * allow const_cast * fix bug * fix comment * fix comments * fix comments * [FIX] fix virtual memory error of using SUPA (#468) * [FIX] fix virtual memory of SUPA * [FIX] fix incorrect copy * [FIX] remove useless copy and add missing 'supa'in cmakelists.txt * make conv2d out at right memory-format (#502) * [dicp][ascend] add fusion switch file for ascend (#512) * [dipu] Speedup profiler ctor when not enabled (#526) * speedup profiler ctor * clean & format include * [DIPU]clang-tidy_shanhang (#516) * Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * Fdy/fix copy tidy (#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (#488) * [dicp][ascend] fix ascend mm/bmm on 910B (#482) * mock torch.cuda.XXXTensor (#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]> * Speedup dumpOnArgLevel by using lazy initialization (#524) * [dicp][ascend] fuse transpose/mm in ascendgraph (#523) * [dicp][ascend] remove unnecessary broadcast (#527) * update kineto (#530) * [dicp][ascend] opt inplace copy (#533) * opt copy inplace * optimzer load_and_run * remove chech return value if (#534) * [dipu] Optimize `getAllocator` by adopting lookup table (#532) * [dipu] Optimize `getAllocator` by adopting lookup table * fix typos & clean includes * resolve comments * shrink lookup table & speedup devproxy::getDeviceCount * Op preference mem format (#525) * add memory perference in op for camb. This change will add a TAG in diopi_functions.yaml and the autogen will replace it with the prefered memory format depending on the convert_config.yaml of the device * fix bug found in ci running * improve the code according to the comment. * improve code format. * improve CMakeLists.txt code. * lyp_clang_tidy: warning uint64_t->int (#518) * clang_tidy:torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp CorrelationIDManager.h * clang_tidy dipu/torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp .h * clang_tidy:torch_dipu/csrc_dipu/profiler/profiler.cpp * clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp * clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp --v2 * clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp * clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp -v2 * clang_tidy: dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEvent.h * clang_tidy: torch_dipu/csrc_dipu/profiler/profiler.h --v2 * clang_tidy: torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp --v2 * clang_tidy: torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp .h --v2 * clang_tidy: magic number; const_cast * clang_tidy: fix some review issus * clang_tidy: modify format by using run_format.sh * [dipu] fix: `torch.prod` int type promotion (#541) `prod` (and other reduction ops) should promote int type (including `bool`) to `int64` when `dtype` is not explicitly provided. Only `prod` (without `dim`) should be taken care of, because the other cases are already correctly handled in PyTorch. * [dipu] fix typo PREFERED -> PREFERRED (#545) * [dicp][ascend] add dicp ci for ascend (#540) * disable autocompare for _amp_foreach_non_finite_check_and_unscale_ (#543) * Update QuickStart.md * revert unnecessary changes * fix linter erros and implement getRuntimeVersion&getDriverVersion for kunlunxin * change device from XPU to KLX * fix build * remove uused code * use DIPU_LOG install of printf * change kunlunxin device key from xpu to klx --------- Co-authored-by: Chengyuan Li <[email protected]> Co-authored-by: Aaron <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: ustclight-sls <[email protected]> Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]> Co-authored-by: lyp-liuyipeng <[email protected]> Co-authored-by: zhaochaoxing <[email protected]>

fandaoyi added 5 commits November 9, 2023 14:30

mv vopy file path

a1c0b3d

add new copy

bd7b21d

fix static param err

85055da

fix copy err

e3e45db

fix direct copy bug

b37a5ae

fandaoyi requested review from mrdanielw, zhaoguochun1995, caikun-pjlab, lljbash and wyz5864 November 20, 2023 03:22

fandaoyi added 8 commits November 20, 2023 11:48

rm unused bcast template name

a387a75

change clang format

19357b0

change name hpp

a3553b3

rm unused header file

8830683

remove unused header 2

b5f4140

change override behavior

bf3ad69

change comment

8ac7ae5

zhaoguochun1995 reviewed Nov 20, 2023

View reviewed changes

dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml Show resolved Hide resolved

mrdanielw reviewed Nov 20, 2023

View reviewed changes

dipu/torch_dipu/csrc_dipu/aten/DIPUATenFunctions.h Outdated Show resolved Hide resolved

change cudacopy

45d92ea

mrdanielw reviewed Nov 20, 2023

View reviewed changes

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.h Outdated Show resolved Hide resolved

mrdanielw reviewed Nov 20, 2023

View reviewed changes

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Show resolved Hide resolved

fandaoyi added 7 commits November 20, 2023 22:44

fix d2d copy err

6624aeb

Merge branch 'fdy/enhance_copy' of https://github.com/DeepLink-org/DIPU…

105e11f

… into fdy/enhance_copy

change register to use autogen

f13e113

Merge branch 'main' into fdy/enhance_copy

68c82b8

revert incorrect format

748eefb

config fallback

b87c91e

fix link err

e7d4dbc

mrdanielw approved these changes Nov 21, 2023

View reviewed changes

fandaoyi added 2 commits November 21, 2023 15:35

fix cpu copy err

e3d1071

add camb vendor copy

e8eb3c2

lljbash requested changes Nov 22, 2023

View reviewed changes

fandaoyi added 4 commits November 22, 2023 17:03

fix copy err

3f58ded

fix copy err 2

e0a47e5

Merge branch 'main' into fdy/enhance_copy

953f724

fix compile err

2ffb43f

caikun-pjlab reviewed Nov 22, 2023

View reviewed changes

dipu/torch_dipu/csrc_dipu/vendor/supa/copyinplace.cpp Show resolved Hide resolved

dipu/torch_dipu/csrc_dipu/vendor/camb/CambCopyInplace.cpp Outdated Show resolved Hide resolved

fandaoyi added 2 commits November 22, 2023 19:25

fix lingjie comment1

9a11353

fix caikun comment

784e9cb

caikun-pjlab reviewed Nov 22, 2023

View reviewed changes

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Show resolved Hide resolved

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Show resolved Hide resolved

caikun-pjlab reviewed Nov 22, 2023

View reviewed changes

dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Show resolved Hide resolved

fandaoyi added 3 commits November 22, 2023 22:46

fix camb ci

6fb980c

fix camb ci

2f241f2

fix device switch err

3740006

lljbash reviewed Nov 24, 2023

View reviewed changes

caikun-pjlab approved these changes Nov 27, 2023

View reviewed changes

dipu/tests/python/unittests/test_copy.py Outdated Show resolved Hide resolved

fandaoyi added 2 commits November 27, 2023 16:26

fix ling jie caikun comment 2

081eaa4

fix comment incorrect local ref

ec74958

lljbash approved these changes Nov 27, 2023

View reviewed changes

change init copy

5919424

fandaoyi merged commit 8db5b50 into main Nov 27, 2023
19 checks passed

fandaoyi deleted the fdy/enhance_copy branch December 11, 2023 03:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fdy/enhance copy #430

Fdy/enhance copy #430

fandaoyi commented Nov 15, 2023 •

edited

Loading

lljbash left a comment

Fdy/enhance copy #430

Fdy/enhance copy #430

Conversation

fandaoyi commented Nov 15, 2023 • edited Loading

lljbash left a comment

Choose a reason for hiding this comment

fandaoyi commented Nov 15, 2023 •

edited

Loading