Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add kunlunxin backend * add kunlunxin device * update copy_ for kunlunxin * lcy/clang-tidy (#483) * fix namespace declaration format * update diopi_functions.yaml * update clang-tidy * update clang-tidy * change tab into spaces * allow const_cast * fix bug * fix comment * fix comments * fix comments * [FIX] fix virtual memory error of using SUPA (#468) * [FIX] fix virtual memory of SUPA * [FIX] fix incorrect copy * [FIX] remove useless copy and add missing 'supa'in cmakelists.txt * make conv2d out at right memory-format (#502) * [dicp][ascend] add fusion switch file for ascend (#512) * [dipu] Speedup profiler ctor when not enabled (#526) * speedup profiler ctor * clean & format include * [DIPU]clang-tidy_shanhang (#516) * Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * Fdy/fix copy tidy (#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (#488) * [dicp][ascend] fix ascend mm/bmm on 910B (#482) * mock torch.cuda.XXXTensor (#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]> * Speedup dumpOnArgLevel by using lazy initialization (#524) * [dicp][ascend] fuse transpose/mm in ascendgraph (#523) * [dicp][ascend] remove unnecessary broadcast (#527) * update kineto (#530) * [dicp][ascend] opt inplace copy (#533) * opt copy inplace * optimzer load_and_run * remove chech return value if (#534) * [dipu] Optimize `getAllocator` by adopting lookup table (#532) * [dipu] Optimize `getAllocator` by adopting lookup table * fix typos & clean includes * resolve comments * shrink lookup table & speedup devproxy::getDeviceCount * Op preference mem format (#525) * add memory perference in op for camb. This change will add a TAG in diopi_functions.yaml and the autogen will replace it with the prefered memory format depending on the convert_config.yaml of the device * fix bug found in ci running * improve the code according to the comment. * improve code format. * improve CMakeLists.txt code. * lyp_clang_tidy: warning uint64_t->int (#518) * clang_tidy:torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp CorrelationIDManager.h * clang_tidy dipu/torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp .h * clang_tidy:torch_dipu/csrc_dipu/profiler/profiler.cpp * clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp * clang_tidy:torch_dipu/csrc_dipu/profiler/patch.cpp --v2 * clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp * clang_tidy:dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp -v2 * clang_tidy: dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEvent.h * clang_tidy: torch_dipu/csrc_dipu/profiler/profiler.h --v2 * clang_tidy: torch_dipu/csrc_dipu/profiler/DIPUDeviceActivity.cpp --v2 * clang_tidy: torch_dipu/csrc_dipu/profiler/CorrelationIDManager.cpp .h --v2 * clang_tidy: magic number; const_cast * clang_tidy: fix some review issus * clang_tidy: modify format by using run_format.sh * [dipu] fix: `torch.prod` int type promotion (#541) `prod` (and other reduction ops) should promote int type (including `bool`) to `int64` when `dtype` is not explicitly provided. Only `prod` (without `dim`) should be taken care of, because the other cases are already correctly handled in PyTorch. * [dipu] fix typo PREFERED -> PREFERRED (#545) * [dicp][ascend] add dicp ci for ascend (#540) * disable autocompare for _amp_foreach_non_finite_check_and_unscale_ (#543) * Update QuickStart.md * revert unnecessary changes * fix linter erros and implement getRuntimeVersion&getDriverVersion for kunlunxin * change device from XPU to KLX * fix build * remove uused code * use DIPU_LOG install of printf * change kunlunxin device key from xpu to klx --------- Co-authored-by: Chengyuan Li <[email protected]> Co-authored-by: Aaron <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: ustclight-sls <[email protected]> Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]> Co-authored-by: lyp-liuyipeng <[email protected]> Co-authored-by: zhaochaoxing <[email protected]>
- Loading branch information