Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fdy/enhance copy #430

Merged
merged 37 commits into from
Nov 27, 2023
Merged

Fdy/enhance copy #430

merged 37 commits into from
Nov 27, 2023

Conversation

fandaoyi
Copy link
Collaborator

@fandaoyi fandaoyi commented Nov 15, 2023

问题:
diopiCopy api 要求有多种能力, vendor 不一定能全部实现(比如遂原只实现了少部分能力),目前的逻辑是要么报错, 要么全部回退到 缓慢的基于cpu的copy (除了direct copy)。我们希望能使用 diopiCopy 提高性能, 但是又要能更灵活的处理不能被支持的case。

  1. 把 autogen 的 dipu_copy 函数仅作为桥接 diopu的接口使用, 行为抽离到新的dipuCopy 类里。目前 autogen 不具备注册关联到多个 diopi 接口的 ‘复杂行为’ 类的能力。 所以目前的实现有点 tricky。
  2. 新的 dipuCopy 提供一组基本构建块 (doXXX 系列)和一定的可定制性。 vendor 可以重写默认实现,使用/部分使用 diopiCopy 来处理 设备上的copy操作, 不能处理的case 可以回退到辅助的 cpu copy。 另外其他copy (d2h 等)默认会自动处理,默认无需vendor 干预 (也可以修改)。
  3. . 将 d2h, h2d, between device 等copy 逻辑优先分派到 设备上执行(如果 diopi copy 可用)以提高性能。
  4. 把原本 由 TensorIterator 处理的一些 check 和 判断逻辑 抽离到我们自定义的 CopyParamInfo 类 和辅助函数, 不再使用 沉重的 TensorIterator。

Conflicts:
	dipu/torch_dipu/csrc_dipu/aten/DIPUATenFunctions.h
	dipu/torch_dipu/csrc_dipu/aten/RegisterDIPU.cpp
	dipu/torch_dipu/csrc_dipu/aten/ops/CopyKernel.cpp
	dipu/torch_dipu/csrc_dipu/aten/ops/CustomFallbackFunctions.hpp
	dipu/torch_dipu/csrc_dipu/runtime/core/DIPUCopyInplace.cpp
	dipu/torch_dipu/csrc_dipu/runtime/core/DIPUCopyInplace.h
	dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.h
	dipu/torch_dipu/csrc_dipu/vendor/cuda/CUDACopyInplace.cpp
	dipu/torch_dipu/csrc_dipu/vendor/supa/copyinplace.cpp
dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Outdated Show resolved Hide resolved
dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Outdated Show resolved Hide resolved
dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Outdated Show resolved Hide resolved
dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Outdated Show resolved Hide resolved
dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Outdated Show resolved Hide resolved
dipu/torch_dipu/csrc_dipu/aten/ops/DIPUCopy.hpp Outdated Show resolved Hide resolved
dipu/torch_dipu/csrc_dipu/vendor/camb/CambCopyInplace.cpp Outdated Show resolved Hide resolved
dipu/tests/python/unittests/test_copy.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@lljbash lljbash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fandaoyi fandaoyi merged commit 8db5b50 into main Nov 27, 2023
19 checks passed
ustclight-sls pushed a commit to DeepLink-org/deeplink.framework.dev that referenced this pull request Dec 8, 2023
* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy
@fandaoyi fandaoyi deleted the fdy/enhance_copy branch December 11, 2023 03:50
mrdanielw pushed a commit that referenced this pull request Dec 13, 2023
* Create main readme

* Update readme.md

* Update readme.md

* Update readme.md

* add clone kineto for dicp (#457)

add clone kineto for dicp

* [dicp][ascend] infer op result_info (#448)

* finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test

* repeal modification to diopi

* modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result'

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* fix gettupleelem in topsgraph

---------

Co-authored-by: jinminxi104 <[email protected]>

* Fdy/enhance copy (#430)

* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy

* update DIOPI submodule (#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (#428)

* [dipu] Fix copy_ fallback of topsrider. (#477)

* [dicp][tops] Add dicp ci of tops. (#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (#474)

* Fdy/fix copy tidy (#471)

* fix tidy 0

* fix clang tidy copy

* fix lingjie comment

* add tidy msg

* fix lint comment

* fix format

* add copy right

* fuj/ add ceil.out (#480)

* add ceil.out

* add floor_ and cases for floor_, ceil and ceil_

* [dipu] tidy some source files and update nv build script (#453)

* fix: tidy some source files
- and also update build nv script

* fix: make clang-format v16 happy

* fix: make clang-format v16 happy

* fix: remove usings and simplify some code

* fix: remove index

* fix: remove initialized_

* fix: add keyword VERSION

* fix: remove VERSION 3.25 as CI is using CMake 3.22

* add 910B CI && remove 910 CI && update DIOPI (#481)

* add 910b

* add 910b

* add 910b

* add 910b

* add resnet50

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* rm nouse code

* update DIOPI submodule (#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (#428)

* [dipu] Fix copy_ fallback of topsrider. (#477)

* [dicp][tops] Add dicp ci of tops. (#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (#474)

* rm 910 ci

* update diopi

* rm 910

---------

Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: wugeshui <[email protected]>

* [dipu]add ascend profiler (#476)

* add ascend profiler

* support with_stack

* code format

* fix clang tidy

* optimize naming

* optimize naming

* add dipu ci on dicp (#488)

* [dicp][ascend] fix ascend mm/bmm on 910B (#482)

* mock torch.cuda.XXXTensor (#462)

* mock torch.cuda.XXXTensor

* add newline at end of file

* fix conflict

* fix format

* fix format

* fix comment

* Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486)

* Fix `multiprocessing.Process` tests not collected by coverage and gcov

* fix --concurrency=multiprocessing

* [dipu] update tidy configuration and remove if-constexpr in C++14 (#470)

* fix: update tidy config and remove if-constexpr

* fix: it should be a list instead of bool value

* feat: update clangd config

* fix: move the comment out of yaml scalar

* docs: add comments

* fix: add DeviceIndex

* fix: add some checks for headers

* feat: update .clang-tidy

* add profiler readme (#489)

* add profiler readme

* Update readme.md

* update

* Update readme.md

* Update readme.md

* Update readme.md

---------

Co-authored-by: caikun-pjlab <[email protected]>

* [dicp][tops] support outputs with inplace copy (#440)

* add dipu stream synchronize.

* adjust some ops.

* fix some paras error and rename device name.

* unset keep_inference_input_mutations.

* fix paras error in conversion.

* fix para dtype conversion.

* fix empty output and inplace copy of input paras in optimizer case.

* remove inplace output gen_empty_tensor.

* Ywt/fix autocompare compile error (#492)

* pass string to python

* disable _amp_foreach_non_finite_check_and_unscale_ autocompare

* [dipu] Wx/support the test for llm inference (#454)

* add one iter for llm

* add bert ci using the correct transformers repository

* add test for the inference of llama 7b using the transformers repository

* one iter test for traditional models by default

* fix bug

* add test for the inference of internlm 7b using the transformers repository

* test for torch_dipu

* set device check args other for maximum.out

* fix the partition arg parsing bug on cuda

* test the setting of CUDA_PARTITION

* fix the bug of setting CUDA_PARTATION

* add llm

* add llm

* optimize the selection of model list

* set pythonpath for torch_dipu

* test

* fix bug in the command of setting pythonpath

---------

Co-authored-by: wugeshui <[email protected]>

* [DIPU]Wx/check the status of build dipu (#490)

* check the status of build dipu on camb and nv

* add check for ascend

* fix the bug of pipe

* [DIPU] Wx/add schema for logical or and logical not ops (#484)

* add schema for logical or and logical not ops

* fix bug and add test cases for these ops

* add the test case: out is empty tensor

* [dicp][ascend] infer op resinfo (part 2) (#491)

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* finish res_op_infer for more simple operators

* Update operator.py

delete some unnecessary print()

* Update operator.py

clean code

* finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems

* clean code format

* Update warning message output in operator.py

* extract common function for general binary and unary operator ,add op bmm's inference

* Update ascend_op.py

delete unuse param

* update DIOPI submodule (#485)

* update DIOPI submodule

* update submodule

* temporily forbid resnet50

* move the testing code to dir under torch_dipu (#465)

* move the testing code to dir under torch_dipu

* fix a little bug

* create two soft link to avoid import torch_dipu  too early.

* add one more soft link file to solve bugs.

* support dev fork ci (#496)

* support dev fork ci

* [dipu] add markdownlint and update most markdown files (#493)

* doc: update docs and add markdownlint

* doc: rename readme.md to README.md

* fix: remove MD013

* doc: format

* [dicp][tops] Support some ops for stable-diffusion. (#467)

* Add sin, cos, erf, split.

1. Generalize MakeTuple in tops_op.
2. Generalize make_const in enflame codegen.
3. Add sin, cos, erf, split for tops.
4. Format Python code in dicp tops.

* refine code

* fix abs test path

* clean up code of split.

* adjust const op generation.

* fix nullptr case in const generation.

---------

Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>

* [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494)

* improve maximum schema due to the case in the inference of internlm

* fix bug according to comments

* fix bug

* [both] fix, format and remove spaces in README.md (#497)

* doc(readme): fix, format and remove spaces

* fix: typo and try auto-correct

* feat(ci): add autocorrect into ci

* fix: remove autocorrect form ci as it's not ready

* update env python 3.10 (#503)

* fix clang tidy

* [dicp][ascend] get soc_version from aclrt (#505)

* fix clang tidy

* fix format

* fix format

---------

Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>
brianlcy123 pushed a commit to brianlcy123/deeplink.framework that referenced this pull request Dec 21, 2023
* Create main readme

* Update readme.md

* Update readme.md

* Update readme.md

* add clone kineto for dicp (DeepLink-org#457)

add clone kineto for dicp

* [dicp][ascend] infer op result_info (DeepLink-org#448)

* finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test

* repeal modification to diopi

* modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result'

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* fix gettupleelem in topsgraph

---------

Co-authored-by: jinminxi104 <[email protected]>

* Fdy/enhance copy (DeepLink-org#430)

* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy

* update DIOPI submodule (DeepLink-org#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428)

* [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477)

* [dicp][tops] Add dicp ci of tops. (DeepLink-org#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (DeepLink-org#474)

* Fdy/fix copy tidy (DeepLink-org#471)

* fix tidy 0

* fix clang tidy copy

* fix lingjie comment

* add tidy msg

* fix lint comment

* fix format

* add copy right

* fuj/ add ceil.out (DeepLink-org#480)

* add ceil.out

* add floor_ and cases for floor_, ceil and ceil_

* [dipu] tidy some source files and update nv build script (DeepLink-org#453)

* fix: tidy some source files
- and also update build nv script

* fix: make clang-format v16 happy

* fix: make clang-format v16 happy

* fix: remove usings and simplify some code

* fix: remove index

* fix: remove initialized_

* fix: add keyword VERSION

* fix: remove VERSION 3.25 as CI is using CMake 3.22

* add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481)

* add 910b

* add 910b

* add 910b

* add 910b

* add resnet50

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* rm nouse code

* update DIOPI submodule (DeepLink-org#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428)

* [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477)

* [dicp][tops] Add dicp ci of tops. (DeepLink-org#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (DeepLink-org#474)

* rm 910 ci

* update diopi

* rm 910

---------

Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: wugeshui <[email protected]>

* [dipu]add ascend profiler (DeepLink-org#476)

* add ascend profiler

* support with_stack

* code format

* fix clang tidy

* optimize naming

* optimize naming

* add dipu ci on dicp (DeepLink-org#488)

* [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482)

* mock torch.cuda.XXXTensor (DeepLink-org#462)

* mock torch.cuda.XXXTensor

* add newline at end of file

* fix conflict

* fix format

* fix format

* fix comment

* Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486)

* Fix `multiprocessing.Process` tests not collected by coverage and gcov

* fix --concurrency=multiprocessing

* [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470)

* fix: update tidy config and remove if-constexpr

* fix: it should be a list instead of bool value

* feat: update clangd config

* fix: move the comment out of yaml scalar

* docs: add comments

* fix: add DeviceIndex

* fix: add some checks for headers

* feat: update .clang-tidy

* add profiler readme (DeepLink-org#489)

* add profiler readme

* Update readme.md

* update

* Update readme.md

* Update readme.md

* Update readme.md

---------

Co-authored-by: caikun-pjlab <[email protected]>

* [dicp][tops] support outputs with inplace copy (DeepLink-org#440)

* add dipu stream synchronize.

* adjust some ops.

* fix some paras error and rename device name.

* unset keep_inference_input_mutations.

* fix paras error in conversion.

* fix para dtype conversion.

* fix empty output and inplace copy of input paras in optimizer case.

* remove inplace output gen_empty_tensor.

* Ywt/fix autocompare compile error (DeepLink-org#492)

* pass string to python

* disable _amp_foreach_non_finite_check_and_unscale_ autocompare

* [dipu] Wx/support the test for llm inference (DeepLink-org#454)

* add one iter for llm

* add bert ci using the correct transformers repository

* add test for the inference of llama 7b using the transformers repository

* one iter test for traditional models by default

* fix bug

* add test for the inference of internlm 7b using the transformers repository

* test for torch_dipu

* set device check args other for maximum.out

* fix the partition arg parsing bug on cuda

* test the setting of CUDA_PARTITION

* fix the bug of setting CUDA_PARTATION

* add llm

* add llm

* optimize the selection of model list

* set pythonpath for torch_dipu

* test

* fix bug in the command of setting pythonpath

---------

Co-authored-by: wugeshui <[email protected]>

* [DIPU]Wx/check the status of build dipu (DeepLink-org#490)

* check the status of build dipu on camb and nv

* add check for ascend

* fix the bug of pipe

* [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484)

* add schema for logical or and logical not ops

* fix bug and add test cases for these ops

* add the test case: out is empty tensor

* [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491)

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* finish res_op_infer for more simple operators

* Update operator.py

delete some unnecessary print()

* Update operator.py

clean code

* finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems

* clean code format

* Update warning message output in operator.py

* extract common function for general binary and unary operator ,add op bmm's inference

* Update ascend_op.py

delete unuse param

* update DIOPI submodule (DeepLink-org#485)

* update DIOPI submodule

* update submodule

* temporily forbid resnet50

* move the testing code to dir under torch_dipu (DeepLink-org#465)

* move the testing code to dir under torch_dipu

* fix a little bug

* create two soft link to avoid import torch_dipu  too early.

* add one more soft link file to solve bugs.

* support dev fork ci (DeepLink-org#496)

* support dev fork ci

* [dipu] add markdownlint and update most markdown files (DeepLink-org#493)

* doc: update docs and add markdownlint

* doc: rename readme.md to README.md

* fix: remove MD013

* doc: format

* [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467)

* Add sin, cos, erf, split.

1. Generalize MakeTuple in tops_op.
2. Generalize make_const in enflame codegen.
3. Add sin, cos, erf, split for tops.
4. Format Python code in dicp tops.

* refine code

* fix abs test path

* clean up code of split.

* adjust const op generation.

* fix nullptr case in const generation.

---------

Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>

* [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494)

* improve maximum schema due to the case in the inference of internlm

* fix bug according to comments

* fix bug

* [both] fix, format and remove spaces in README.md (DeepLink-org#497)

* doc(readme): fix, format and remove spaces

* fix: typo and try auto-correct

* feat(ci): add autocorrect into ci

* fix: remove autocorrect form ci as it's not ready

* update env python 3.10 (DeepLink-org#503)

* fix clang tidy

* [dicp][ascend] get soc_version from aclrt (DeepLink-org#505)

* fix clang tidy

* fix format

* fix format

---------

Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>
brianlcy123 pushed a commit to brianlcy123/deeplink.framework that referenced this pull request Dec 21, 2023
* Create main readme

* Update readme.md

* Update readme.md

* Update readme.md

* add clone kineto for dicp (DeepLink-org#457)

add clone kineto for dicp

* [dicp][ascend] infer op result_info (DeepLink-org#448)

* finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test

* repeal modification to diopi

* modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result'

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* fix gettupleelem in topsgraph

---------

Co-authored-by: jinminxi104 <[email protected]>

* Fdy/enhance copy (DeepLink-org#430)

* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy

* update DIOPI submodule (DeepLink-org#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428)

* [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477)

* [dicp][tops] Add dicp ci of tops. (DeepLink-org#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (DeepLink-org#474)

* Fdy/fix copy tidy (DeepLink-org#471)

* fix tidy 0

* fix clang tidy copy

* fix lingjie comment

* add tidy msg

* fix lint comment

* fix format

* add copy right

* fuj/ add ceil.out (DeepLink-org#480)

* add ceil.out

* add floor_ and cases for floor_, ceil and ceil_

* [dipu] tidy some source files and update nv build script (DeepLink-org#453)

* fix: tidy some source files
- and also update build nv script

* fix: make clang-format v16 happy

* fix: make clang-format v16 happy

* fix: remove usings and simplify some code

* fix: remove index

* fix: remove initialized_

* fix: add keyword VERSION

* fix: remove VERSION 3.25 as CI is using CMake 3.22

* add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481)

* add 910b

* add 910b

* add 910b

* add 910b

* add resnet50

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* rm nouse code

* update DIOPI submodule (DeepLink-org#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428)

* [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477)

* [dicp][tops] Add dicp ci of tops. (DeepLink-org#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (DeepLink-org#474)

* rm 910 ci

* update diopi

* rm 910

---------

Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: wugeshui <[email protected]>

* [dipu]add ascend profiler (DeepLink-org#476)

* add ascend profiler

* support with_stack

* code format

* fix clang tidy

* optimize naming

* optimize naming

* add dipu ci on dicp (DeepLink-org#488)

* [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482)

* mock torch.cuda.XXXTensor (DeepLink-org#462)

* mock torch.cuda.XXXTensor

* add newline at end of file

* fix conflict

* fix format

* fix format

* fix comment

* Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486)

* Fix `multiprocessing.Process` tests not collected by coverage and gcov

* fix --concurrency=multiprocessing

* [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470)

* fix: update tidy config and remove if-constexpr

* fix: it should be a list instead of bool value

* feat: update clangd config

* fix: move the comment out of yaml scalar

* docs: add comments

* fix: add DeviceIndex

* fix: add some checks for headers

* feat: update .clang-tidy

* add profiler readme (DeepLink-org#489)

* add profiler readme

* Update readme.md

* update

* Update readme.md

* Update readme.md

* Update readme.md

---------

Co-authored-by: caikun-pjlab <[email protected]>

* [dicp][tops] support outputs with inplace copy (DeepLink-org#440)

* add dipu stream synchronize.

* adjust some ops.

* fix some paras error and rename device name.

* unset keep_inference_input_mutations.

* fix paras error in conversion.

* fix para dtype conversion.

* fix empty output and inplace copy of input paras in optimizer case.

* remove inplace output gen_empty_tensor.

* Ywt/fix autocompare compile error (DeepLink-org#492)

* pass string to python

* disable _amp_foreach_non_finite_check_and_unscale_ autocompare

* [dipu] Wx/support the test for llm inference (DeepLink-org#454)

* add one iter for llm

* add bert ci using the correct transformers repository

* add test for the inference of llama 7b using the transformers repository

* one iter test for traditional models by default

* fix bug

* add test for the inference of internlm 7b using the transformers repository

* test for torch_dipu

* set device check args other for maximum.out

* fix the partition arg parsing bug on cuda

* test the setting of CUDA_PARTITION

* fix the bug of setting CUDA_PARTATION

* add llm

* add llm

* optimize the selection of model list

* set pythonpath for torch_dipu

* test

* fix bug in the command of setting pythonpath

---------

Co-authored-by: wugeshui <[email protected]>

* [DIPU]Wx/check the status of build dipu (DeepLink-org#490)

* check the status of build dipu on camb and nv

* add check for ascend

* fix the bug of pipe

* [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484)

* add schema for logical or and logical not ops

* fix bug and add test cases for these ops

* add the test case: out is empty tensor

* [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491)

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* finish res_op_infer for more simple operators

* Update operator.py

delete some unnecessary print()

* Update operator.py

clean code

* finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems

* clean code format

* Update warning message output in operator.py

* extract common function for general binary and unary operator ,add op bmm's inference

* Update ascend_op.py

delete unuse param

* update DIOPI submodule (DeepLink-org#485)

* update DIOPI submodule

* update submodule

* temporily forbid resnet50

* move the testing code to dir under torch_dipu (DeepLink-org#465)

* move the testing code to dir under torch_dipu

* fix a little bug

* create two soft link to avoid import torch_dipu  too early.

* add one more soft link file to solve bugs.

* support dev fork ci (DeepLink-org#496)

* support dev fork ci

* [dipu] add markdownlint and update most markdown files (DeepLink-org#493)

* doc: update docs and add markdownlint

* doc: rename readme.md to README.md

* fix: remove MD013

* doc: format

* [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467)

* Add sin, cos, erf, split.

1. Generalize MakeTuple in tops_op.
2. Generalize make_const in enflame codegen.
3. Add sin, cos, erf, split for tops.
4. Format Python code in dicp tops.

* refine code

* fix abs test path

* clean up code of split.

* adjust const op generation.

* fix nullptr case in const generation.

---------

Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>

* [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494)

* improve maximum schema due to the case in the inference of internlm

* fix bug according to comments

* fix bug

* [both] fix, format and remove spaces in README.md (DeepLink-org#497)

* doc(readme): fix, format and remove spaces

* fix: typo and try auto-correct

* feat(ci): add autocorrect into ci

* fix: remove autocorrect form ci as it's not ready

* update env python 3.10 (DeepLink-org#503)

* fix clang tidy

* [dicp][ascend] get soc_version from aclrt (DeepLink-org#505)

* fix clang tidy

* fix format

* fix format

---------

Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>
brianlcy123 pushed a commit to brianlcy123/deeplink.framework that referenced this pull request Dec 21, 2023
* Create main readme

* Update readme.md

* Update readme.md

* Update readme.md

* add clone kineto for dicp (DeepLink-org#457)

add clone kineto for dicp

* [dicp][ascend] infer op result_info (DeepLink-org#448)

* finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test

* repeal modification to diopi

* modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result'

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* fix gettupleelem in topsgraph

---------

Co-authored-by: jinminxi104 <[email protected]>

* Fdy/enhance copy (DeepLink-org#430)

* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy

* update DIOPI submodule (DeepLink-org#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428)

* [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477)

* [dicp][tops] Add dicp ci of tops. (DeepLink-org#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (DeepLink-org#474)

* Fdy/fix copy tidy (DeepLink-org#471)

* fix tidy 0

* fix clang tidy copy

* fix lingjie comment

* add tidy msg

* fix lint comment

* fix format

* add copy right

* fuj/ add ceil.out (DeepLink-org#480)

* add ceil.out

* add floor_ and cases for floor_, ceil and ceil_

* [dipu] tidy some source files and update nv build script (DeepLink-org#453)

* fix: tidy some source files
- and also update build nv script

* fix: make clang-format v16 happy

* fix: make clang-format v16 happy

* fix: remove usings and simplify some code

* fix: remove index

* fix: remove initialized_

* fix: add keyword VERSION

* fix: remove VERSION 3.25 as CI is using CMake 3.22

* add 910B CI && remove 910 CI && update DIOPI (DeepLink-org#481)

* add 910b

* add 910b

* add 910b

* add 910b

* add resnet50

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* rm nouse code

* update DIOPI submodule (DeepLink-org#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (DeepLink-org#428)

* [dipu] Fix copy_ fallback of topsrider. (DeepLink-org#477)

* [dicp][tops] Add dicp ci of tops. (DeepLink-org#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (DeepLink-org#474)

* rm 910 ci

* update diopi

* rm 910

---------

Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: wugeshui <[email protected]>

* [dipu]add ascend profiler (DeepLink-org#476)

* add ascend profiler

* support with_stack

* code format

* fix clang tidy

* optimize naming

* optimize naming

* add dipu ci on dicp (DeepLink-org#488)

* [dicp][ascend] fix ascend mm/bmm on 910B (DeepLink-org#482)

* mock torch.cuda.XXXTensor (DeepLink-org#462)

* mock torch.cuda.XXXTensor

* add newline at end of file

* fix conflict

* fix format

* fix format

* fix comment

* Fix `multiprocessing.Process` tests not collected by coverage and gcov (DeepLink-org#486)

* Fix `multiprocessing.Process` tests not collected by coverage and gcov

* fix --concurrency=multiprocessing

* [dipu] update tidy configuration and remove if-constexpr in C++14 (DeepLink-org#470)

* fix: update tidy config and remove if-constexpr

* fix: it should be a list instead of bool value

* feat: update clangd config

* fix: move the comment out of yaml scalar

* docs: add comments

* fix: add DeviceIndex

* fix: add some checks for headers

* feat: update .clang-tidy

* add profiler readme (DeepLink-org#489)

* add profiler readme

* Update readme.md

* update

* Update readme.md

* Update readme.md

* Update readme.md

---------

Co-authored-by: caikun-pjlab <[email protected]>

* [dicp][tops] support outputs with inplace copy (DeepLink-org#440)

* add dipu stream synchronize.

* adjust some ops.

* fix some paras error and rename device name.

* unset keep_inference_input_mutations.

* fix paras error in conversion.

* fix para dtype conversion.

* fix empty output and inplace copy of input paras in optimizer case.

* remove inplace output gen_empty_tensor.

* Ywt/fix autocompare compile error (DeepLink-org#492)

* pass string to python

* disable _amp_foreach_non_finite_check_and_unscale_ autocompare

* [dipu] Wx/support the test for llm inference (DeepLink-org#454)

* add one iter for llm

* add bert ci using the correct transformers repository

* add test for the inference of llama 7b using the transformers repository

* one iter test for traditional models by default

* fix bug

* add test for the inference of internlm 7b using the transformers repository

* test for torch_dipu

* set device check args other for maximum.out

* fix the partition arg parsing bug on cuda

* test the setting of CUDA_PARTITION

* fix the bug of setting CUDA_PARTATION

* add llm

* add llm

* optimize the selection of model list

* set pythonpath for torch_dipu

* test

* fix bug in the command of setting pythonpath

---------

Co-authored-by: wugeshui <[email protected]>

* [DIPU]Wx/check the status of build dipu (DeepLink-org#490)

* check the status of build dipu on camb and nv

* add check for ascend

* fix the bug of pipe

* [DIPU] Wx/add schema for logical or and logical not ops (DeepLink-org#484)

* add schema for logical or and logical not ops

* fix bug and add test cases for these ops

* add the test case: out is empty tensor

* [dicp][ascend] infer op resinfo (part 2) (DeepLink-org#491)

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* finish res_op_infer for more simple operators

* Update operator.py

delete some unnecessary print()

* Update operator.py

clean code

* finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems

* clean code format

* Update warning message output in operator.py

* extract common function for general binary and unary operator ,add op bmm's inference

* Update ascend_op.py

delete unuse param

* update DIOPI submodule (DeepLink-org#485)

* update DIOPI submodule

* update submodule

* temporily forbid resnet50

* move the testing code to dir under torch_dipu (DeepLink-org#465)

* move the testing code to dir under torch_dipu

* fix a little bug

* create two soft link to avoid import torch_dipu  too early.

* add one more soft link file to solve bugs.

* support dev fork ci (DeepLink-org#496)

* support dev fork ci

* [dipu] add markdownlint and update most markdown files (DeepLink-org#493)

* doc: update docs and add markdownlint

* doc: rename readme.md to README.md

* fix: remove MD013

* doc: format

* [dicp][tops] Support some ops for stable-diffusion. (DeepLink-org#467)

* Add sin, cos, erf, split.

1. Generalize MakeTuple in tops_op.
2. Generalize make_const in enflame codegen.
3. Add sin, cos, erf, split for tops.
4. Format Python code in dicp tops.

* refine code

* fix abs test path

* clean up code of split.

* adjust const op generation.

* fix nullptr case in const generation.

---------

Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>

* [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (DeepLink-org#494)

* improve maximum schema due to the case in the inference of internlm

* fix bug according to comments

* fix bug

* [both] fix, format and remove spaces in README.md (DeepLink-org#497)

* doc(readme): fix, format and remove spaces

* fix: typo and try auto-correct

* feat(ci): add autocorrect into ci

* fix: remove autocorrect form ci as it's not ready

* update env python 3.10 (DeepLink-org#503)

* fix clang tidy

* [dicp][ascend] get soc_version from aclrt (DeepLink-org#505)

* fix clang tidy

* fix format

* fix format

---------

Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>
caikun-pjlab added a commit that referenced this pull request Dec 22, 2023
* add kunlunxin backend

* add kunlunxin device

* update copy_ for kunlunxin

* lcy/clang-tidy (#483)

* fix namespace declaration format

* update diopi_functions.yaml

* update clang-tidy

* update clang-tidy

* change tab into spaces

* allow const_cast

* fix bug

* fix comment

* fix comments

* fix comments

* [FIX] fix virtual memory error of using SUPA (#468)

* [FIX] fix virtual memory of SUPA

* [FIX] fix incorrect copy

* [FIX] remove useless copy and add missing 'supa'in cmakelists.txt

* make conv2d out at right memory-format (#502)

* [dicp][ascend] add fusion switch file for ascend (#512)

* [dipu] Speedup profiler ctor when not enabled (#526)

* speedup profiler ctor

* clean & format include

* [DIPU]clang-tidy_shanhang (#516)

* Create main readme

* Update readme.md

* Update readme.md

* Update readme.md

* add clone kineto for dicp (#457)

add clone kineto for dicp

* [dicp][ascend] infer op result_info (#448)

* finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test

* repeal modification to diopi

* modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result'

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* fix gettupleelem in topsgraph

---------

Co-authored-by: jinminxi104 <[email protected]>

* Fdy/enhance copy (#430)

* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy

* update DIOPI submodule (#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (#428)

* [dipu] Fix copy_ fallback of topsrider. (#477)

* [dicp][tops] Add dicp ci of tops. (#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (#474)

* Fdy/fix copy tidy (#471)

* fix tidy 0

* fix clang tidy copy

* fix lingjie comment

* add tidy msg

* fix lint comment

* fix format

* add copy right

* fuj/ add ceil.out (#480)

* add ceil.out

* add floor_ and cases for floor_, ceil and ceil_

* [dipu] tidy some source files and update nv build script (#453)

* fix: tidy some source files
- and also update build nv script

* fix: make clang-format v16 happy

* fix: make clang-format v16 happy

* fix: remove usings and simplify some code

* fix: remove index

* fix: remove initialized_

* fix: add keyword VERSION

* fix: remove VERSION 3.25 as CI is using CMake 3.22

* add 910B CI && remove 910 CI && update DIOPI (#481)

* add 910b

* add 910b

* add 910b

* add 910b

* add resnet50

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* rm nouse code

* update DIOPI submodule (#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (#428)

* [dipu] Fix copy_ fallback of topsrider. (#477)

* [dicp][tops] Add dicp ci of tops. (#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (#474)

* rm 910 ci

* update diopi

* rm 910

---------

Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: wugeshui <[email protected]>

* [dipu]add ascend profiler (#476)

* add ascend profiler

* support with_stack

* code format

* fix clang tidy

* optimize naming

* optimize naming

* add dipu ci on dicp (#488)

* [dicp][ascend] fix ascend mm/bmm on 910B (#482)

* mock torch.cuda.XXXTensor (#462)

* mock torch.cuda.XXXTensor

* add newline at end of file

* fix conflict

* fix format

* fix format

* fix comment

* Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486)

* Fix `multiprocessing.Process` tests not collected by coverage and gcov

* fix --concurrency=multiprocessing

* [dipu] update tidy configuration and remove if-constexpr in C++14 (#470)

* fix: update tidy config and remove if-constexpr

* fix: it should be a list instead of bool value

* feat: update clangd config

* fix: move the comment out of yaml scalar

* docs: add comments

* fix: add DeviceIndex

* fix: add some checks for headers

* feat: update .clang-tidy

* add profiler readme (#489)

* add profiler readme

* Update readme.md

* update

* Update readme.md

* Update readme.md

* Update readme.md

---------

Co-authored-by: caikun-pjlab <[email protected]>

* [dicp][tops] support outputs with inplace copy (#440)

* add dipu stream synchronize.

* adjust some ops.

* fix some paras error and rename device name.

* unset keep_inference_input_mutations.

* fix paras error in conversion.

* fix para dtype conversion.

* fix empty output and inplace copy of input paras in optimizer case.

* remove inplace output gen_empty_tensor.

* Ywt/fix autocompare compile error (#492)

* pass string to python

* disable _amp_foreach_non_finite_check_and_unscale_ autocompare

* [dipu] Wx/support the test for llm inference (#454)

* add one iter for llm

* add bert ci using the correct transformers repository

* add test for the inference of llama 7b using the transformers repository

* one iter test for traditional models by default

* fix bug

* add test for the inference of internlm 7b using the transformers repository

* test for torch_dipu

* set device check args other for maximum.out

* fix the partition arg parsing bug on cuda

* test the setting of CUDA_PARTITION

* fix the bug of setting CUDA_PARTATION

* add llm

* add llm

* optimize the selection of model list

* set pythonpath for torch_dipu

* test

* fix bug in the command of setting pythonpath

---------

Co-authored-by: wugeshui <[email protected]>

* [DIPU]Wx/check the status of build dipu (#490)

* check the status of build dipu on camb and nv

* add check for ascend

* fix the bug of pipe

* [DIPU] Wx/add schema for logical or and logical not ops (#484)

* add schema for logical or and logical not ops

* fix bug and add test cases for these ops

* add the test case: out is empty tensor

* [dicp][ascend] infer op resinfo (part 2) (#491)

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* finish res_op_infer for more simple operators

* Update operator.py

delete some unnecessary print()

* Update operator.py

clean code

* finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems

* clean code format

* Update warning message output in operator.py

* extract common function for general binary and unary operator ,add op bmm's inference

* Update ascend_op.py

delete unuse param

* update DIOPI submodule (#485)

* update DIOPI submodule

* update submodule

* temporily forbid resnet50

* move the testing code to dir under torch_dipu (#465)

* move the testing code to dir under torch_dipu

* fix a little bug

* create two soft link to avoid import torch_dipu  too early.

* add one more soft link file to solve bugs.

* support dev fork ci (#496)

* support dev fork ci

* [dipu] add markdownlint and update most markdown files (#493)

* doc: update docs and add markdownlint

* doc: rename readme.md to README.md

* fix: remove MD013

* doc: format

* [dicp][tops] Support some ops for stable-diffusion. (#467)

* Add sin, cos, erf, split.

1. Generalize MakeTuple in tops_op.
2. Generalize make_const in enflame codegen.
3. Add sin, cos, erf, split for tops.
4. Format Python code in dicp tops.

* refine code

* fix abs test path

* clean up code of split.

* adjust const op generation.

* fix nullptr case in const generation.

---------

Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>

* [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494)

* improve maximum schema due to the case in the inference of internlm

* fix bug according to comments

* fix bug

* [both] fix, format and remove spaces in README.md (#497)

* doc(readme): fix, format and remove spaces

* fix: typo and try auto-correct

* feat(ci): add autocorrect into ci

* fix: remove autocorrect form ci as it's not ready

* update env python 3.10 (#503)

* fix clang tidy

* [dicp][ascend] get soc_version from aclrt (#505)

* fix clang tidy

* fix format

* fix format

---------

Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>

* Speedup dumpOnArgLevel by using lazy initialization (#524)

* [dicp][ascend] fuse transpose/mm in ascendgraph (#523)

* [dicp][ascend] remove unnecessary broadcast (#527)

* update kineto (#530)

* [dicp][ascend] opt inplace copy (#533)

* opt copy inplace

* optimzer load_and_run

* remove chech return value if (#534)

* [dipu] Optimize `getAllocator` by adopting lookup table (#532)

* [dipu] Optimize `getAllocator` by adopting lookup table

* fix typos & clean includes

* resolve comments

* shrink lookup table & speedup devproxy::getDeviceCount

* Op preference mem format (#525)

* add memory perference in op for camb.
This change will add a TAG in diopi_functions.yaml and the autogen will replace it with the prefered memory format depending on the convert_config.yaml of the device

* fix bug found in ci running

* improve the code according to the comment.

* improve code format.

* improve CMakeLists.txt code.

* lyp_clang_tidy: warning uint64_t->int (#518)

* clang_tidy:torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp
                                         CorrelationIDManager.h

* clang_tidy dipu/torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp .h

* clang_tidy:torch_dipu/csrc_dipu/profiler/profiler.cpp

* clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp

* clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp --v2

* clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp

* clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp -v2

* clang_tidy: dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEvent.h

* clang_tidy: torch_dipu/csrc_dipu/profiler/profiler.h --v2

* clang_tidy: torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp --v2

* clang_tidy: torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp .h --v2

* clang_tidy: magic number; const_cast

* clang_tidy: fix some review issus

* clang_tidy: modify format by using run_format.sh

* [dipu] fix: `torch.prod` int type promotion (#541)

`prod` (and other reduction ops) should promote int type (including `bool`) to `int64` when `dtype` is not explicitly provided.

Only `prod` (without `dim`) should be taken care of, because the other cases are already correctly handled in PyTorch.

* [dipu] fix typo PREFERED -> PREFERRED (#545)

* [dicp][ascend] add dicp ci for ascend (#540)

* disable autocompare for _amp_foreach_non_finite_check_and_unscale_ (#543)

* Update QuickStart.md

* revert unnecessary changes

* fix linter erros and implement getRuntimeVersion&getDriverVersion for kunlunxin

* change device from XPU to KLX

* fix build

* remove uused code

* use DIPU_LOG install of printf

* change kunlunxin device key from xpu to klx

---------

Co-authored-by: Chengyuan Li <[email protected]>
Co-authored-by: Aaron <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: ustclight-sls <[email protected]>
Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>
Co-authored-by: lyp-liuyipeng <[email protected]>
Co-authored-by: zhaochaoxing <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants