-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fdy/enhance copy #430
Merged
Merged
Fdy/enhance copy #430
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fandaoyi
requested review from
mrdanielw,
zhaoguochun1995,
caikun-pjlab,
lljbash and
wyz5864
November 20, 2023 03:22
Conflicts: dipu/torch_dipu/csrc_dipu/aten/DIPUATenFunctions.h dipu/torch_dipu/csrc_dipu/aten/RegisterDIPU.cpp dipu/torch_dipu/csrc_dipu/aten/ops/CopyKernel.cpp dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctions.hpp dipu/torch_dipu/csrc_dipu/runtime/core/DIPUCopyInplace.cpp dipu/torch_dipu/csrc_dipu/runtime/core/DIPUCopyInplace.h dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.h dipu/torch_dipu/csrc_dipu/vendor/cuda/CUDACopyInplace.cpp dipu/torch_dipu/csrc_dipu/vendor/supa/copyinplace.cpp
mrdanielw
reviewed
Nov 20, 2023
mrdanielw
reviewed
Nov 20, 2023
mrdanielw
reviewed
Nov 20, 2023
mrdanielw
approved these changes
Nov 21, 2023
lljbash
requested changes
Nov 22, 2023
lljbash
reviewed
Nov 24, 2023
dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctionsForCopy.cpp
Outdated
Show resolved
Hide resolved
caikun-pjlab
approved these changes
Nov 27, 2023
lljbash
approved these changes
Nov 27, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ustclight-sls
pushed a commit
to DeepLink-org/deeplink.framework.dev
that referenced
this pull request
Dec 8, 2023
* mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy
mrdanielw
pushed a commit
that referenced
this pull request
Dec 13, 2023
* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * Fdy/fix copy tidy (#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (#488) * [dicp][ascend] fix ascend mm/bmm on 910B (#482) * mock torch.cuda.XXXTensor (#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>
brianlcy123
pushed a commit
to brianlcy123/deeplink.framework
that referenced
this pull request
Dec 21, 2023
* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (DeepLink-org#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (DeepLink-org#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (DeepLink-org#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * Fdy/fix copy tidy (DeepLink-org#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (DeepLink-org#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (DeepLink-org#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (DeepLink-org#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (DeepLink-org#488) * [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482) * mock torch.cuda.XXXTensor (DeepLink-org#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (DeepLink-org#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (DeepLink-org#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (DeepLink-org#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (DeepLink-org#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (DeepLink-org#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (DeepLink-org#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (DeepLink-org#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (DeepLink-org#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (DeepLink-org#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (DeepLink-org#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (DeepLink-org#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (DeepLink-org#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>
brianlcy123
pushed a commit
to brianlcy123/deeplink.framework
that referenced
this pull request
Dec 21, 2023
* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (DeepLink-org#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (DeepLink-org#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (DeepLink-org#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * Fdy/fix copy tidy (DeepLink-org#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (DeepLink-org#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (DeepLink-org#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (DeepLink-org#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (DeepLink-org#488) * [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482) * mock torch.cuda.XXXTensor (DeepLink-org#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (DeepLink-org#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (DeepLink-org#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (DeepLink-org#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (DeepLink-org#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (DeepLink-org#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (DeepLink-org#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (DeepLink-org#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (DeepLink-org#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (DeepLink-org#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (DeepLink-org#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (DeepLink-org#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (DeepLink-org#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>
brianlcy123
pushed a commit
to brianlcy123/deeplink.framework
that referenced
this pull request
Dec 21, 2023
* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (DeepLink-org#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (DeepLink-org#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (DeepLink-org#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * Fdy/fix copy tidy (DeepLink-org#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (DeepLink-org#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (DeepLink-org#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (DeepLink-org#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428) * [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477) * [dicp][tops] Add dicp ci of tops. (DeepLink-org#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (DeepLink-org#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (DeepLink-org#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (DeepLink-org#488) * [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482) * mock torch.cuda.XXXTensor (DeepLink-org#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (DeepLink-org#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (DeepLink-org#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (DeepLink-org#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (DeepLink-org#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (DeepLink-org#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (DeepLink-org#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (DeepLink-org#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (DeepLink-org#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (DeepLink-org#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (DeepLink-org#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (DeepLink-org#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (DeepLink-org#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>
caikun-pjlab
added a commit
that referenced
this pull request
Dec 22, 2023
* add kunlunxin backend * add kunlunxin device * update copy_ for kunlunxin * lcy/clang-tidy (#483) * fix namespace declaration format * update diopi_functions.yaml * update clang-tidy * update clang-tidy * change tab into spaces * allow const_cast * fix bug * fix comment * fix comments * fix comments * [FIX] fix virtual memory error of using SUPA (#468) * [FIX] fix virtual memory of SUPA * [FIX] fix incorrect copy * [FIX] remove useless copy and add missing 'supa'in cmakelists.txt * make conv2d out at right memory-format (#502) * [dicp][ascend] add fusion switch file for ascend (#512) * [dipu] Speedup profiler ctor when not enabled (#526) * speedup profiler ctor * clean & format include * [DIPU]clang-tidy_shanhang (#516) * Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * Fdy/fix copy tidy (#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (#488) * [dicp][ascend] fix ascend mm/bmm on 910B (#482) * mock torch.cuda.XXXTensor (#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]> * Speedup dumpOnArgLevel by using lazy initialization (#524) * [dicp][ascend] fuse transpose/mm in ascendgraph (#523) * [dicp][ascend] remove unnecessary broadcast (#527) * update kineto (#530) * [dicp][ascend] opt inplace copy (#533) * opt copy inplace * optimzer load_and_run * remove chech return value if (#534) * [dipu] Optimize `getAllocator` by adopting lookup table (#532) * [dipu] Optimize `getAllocator` by adopting lookup table * fix typos & clean includes * resolve comments * shrink lookup table & speedup devproxy::getDeviceCount * Op preference mem format (#525) * add memory perference in op for camb. This change will add a TAG in diopi_functions.yaml and the autogen will replace it with the prefered memory format depending on the convert_config.yaml of the device * fix bug found in ci running * improve the code according to the comment. * improve code format. * improve CMakeLists.txt code. * lyp_clang_tidy: warning uint64_t->int (#518) * clang_tidy:torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp CorrelationIDManager.h * clang_tidy dipu/torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp .h * clang_tidy:torch_dipu/csrc_dipu/profiler/profiler.cpp * clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp * clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp --v2 * clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp * clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp -v2 * clang_tidy: dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEvent.h * clang_tidy: torch_dipu/csrc_dipu/profiler/profiler.h --v2 * clang_tidy: torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp --v2 * clang_tidy: torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp .h --v2 * clang_tidy: magic number; const_cast * clang_tidy: fix some review issus * clang_tidy: modify format by using run_format.sh * [dipu] fix: `torch.prod` int type promotion (#541) `prod` (and other reduction ops) should promote int type (including `bool`) to `int64` when `dtype` is not explicitly provided. Only `prod` (without `dim`) should be taken care of, because the other cases are already correctly handled in PyTorch. * [dipu] fix typo PREFERED -> PREFERRED (#545) * [dicp][ascend] add dicp ci for ascend (#540) * disable autocompare for _amp_foreach_non_finite_check_and_unscale_ (#543) * Update QuickStart.md * revert unnecessary changes * fix linter erros and implement getRuntimeVersion&getDriverVersion for kunlunxin * change device from XPU to KLX * fix build * remove uused code * use DIPU_LOG install of printf * change kunlunxin device key from xpu to klx --------- Co-authored-by: Chengyuan Li <[email protected]> Co-authored-by: Aaron <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: ustclight-sls <[email protected]> Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]> Co-authored-by: lyp-liuyipeng <[email protected]> Co-authored-by: zhaochaoxing <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
问题:
diopiCopy api 要求有多种能力, vendor 不一定能全部实现(比如遂原只实现了少部分能力),目前的逻辑是要么报错, 要么全部回退到 缓慢的基于cpu的copy (除了direct copy)。我们希望能使用 diopiCopy 提高性能, 但是又要能更灵活的处理不能被支持的case。