[DIPU]clang-tidy_shanhang (#516)

* Create main readme * Update readme.md * Update readme.md * Update readme.md * add clone kineto for dicp (#457) add clone kineto for dicp * [dicp][ascend] infer op result_info (#448) * finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test * repeal modification to diopi * modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result' * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * fix gettupleelem in topsgraph --------- Co-authored-by: jinminxi104 <[email protected]> * Fdy/enhance copy (#430) * mv vopy file path * add new copy * fix static param err * fix copy err * fix direct copy bug * rm unused bcast template name * change clang format * change name hpp * rm unused header file * remove unused header 2 * change override behavior * change comment * change cudacopy * fix d2d copy err * change register to use autogen * revert incorrect format * config fallback * fix link err * fix comment wanglei * add newline * fix cpu copy err * add camb vendor copy * fix copy err * fix copy err 2 * fix compile err * fix lingjie comment1 * fix caikun comment * fix camb ci * fix camb ci * fix device switch err * fix ling jie caikun comment 2 * fix comment incorrect local ref * change init copy * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * Fdy/fix copy tidy (#471) * fix tidy 0 * fix clang tidy copy * fix lingjie comment * add tidy msg * fix lint comment * fix format * add copy right * fuj/ add ceil.out (#480) * add ceil.out * add floor_ and cases for floor_, ceil and ceil_ * [dipu] tidy some source files and update nv build script (#453) * fix: tidy some source files - and also update build nv script * fix: make clang-format v16 happy * fix: make clang-format v16 happy * fix: remove usings and simplify some code * fix: remove index * fix: remove initialized_ * fix: add keyword VERSION * fix: remove VERSION 3.25 as CI is using CMake 3.22 * add 910B CI && remove 910 CI && update DIOPI (#481) * add 910b * add 910b * add 910b * add 910b * add resnet50 * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * rm nouse code * update DIOPI submodule (#458) * update DIOPI submodule * diopi update to main * update mmcv version * update submodule * update mmcv commit id * feat: pass CMAKE_BUILD_TYPE into DIOPI (#428) * [dipu] Fix copy_ fallback of topsrider. (#477) * [dicp][tops] Add dicp ci of tops. (#469) * Add dicp ci of tops. * Fix dicp ci of tops. * fix recycle dep (#474) * rm 910 ci * update diopi * rm 910 --------- Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: wugeshui <[email protected]> * [dipu]add ascend profiler (#476) * add ascend profiler * support with_stack * code format * fix clang tidy * optimize naming * optimize naming * add dipu ci on dicp (#488) * [dicp][ascend] fix ascend mm/bmm on 910B (#482) * mock torch.cuda.XXXTensor (#462) * mock torch.cuda.XXXTensor * add newline at end of file * fix conflict * fix format * fix format * fix comment * Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486) * Fix `multiprocessing.Process` tests not collected by coverage and gcov * fix --concurrency=multiprocessing * [dipu] update tidy configuration and remove if-constexpr in C++14 (#470) * fix: update tidy config and remove if-constexpr * fix: it should be a list instead of bool value * feat: update clangd config * fix: move the comment out of yaml scalar * docs: add comments * fix: add DeviceIndex * fix: add some checks for headers * feat: update .clang-tidy * add profiler readme (#489) * add profiler readme * Update readme.md * update * Update readme.md * Update readme.md * Update readme.md --------- Co-authored-by: caikun-pjlab <[email protected]> * [dicp][tops] support outputs with inplace copy (#440) * add dipu stream synchronize. * adjust some ops. * fix some paras error and rename device name. * unset keep_inference_input_mutations. * fix paras error in conversion. * fix para dtype conversion. * fix empty output and inplace copy of input paras in optimizer case. * remove inplace output gen_empty_tensor. * Ywt/fix autocompare compile error (#492) * pass string to python * disable _amp_foreach_non_finite_check_and_unscale_ autocompare * [dipu] Wx/support the test for llm inference (#454) * add one iter for llm * add bert ci using the correct transformers repository * add test for the inference of llama 7b using the transformers repository * one iter test for traditional models by default * fix bug * add test for the inference of internlm 7b using the transformers repository * test for torch_dipu * set device check args other for maximum.out * fix the partition arg parsing bug on cuda * test the setting of CUDA_PARTITION * fix the bug of setting CUDA_PARTATION * add llm * add llm * optimize the selection of model list * set pythonpath for torch_dipu * test * fix bug in the command of setting pythonpath --------- Co-authored-by: wugeshui <[email protected]> * [DIPU]Wx/check the status of build dipu (#490) * check the status of build dipu on camb and nv * add check for ascend * fix the bug of pipe * [DIPU] Wx/add schema for logical or and logical not ops (#484) * add schema for logical or and logical not ops * fix bug and add test cases for these ops * add the test case: out is empty tensor * [dicp][ascend] infer op resinfo (part 2) (#491) * fix a bug in get_cast_dtype: type(int+bool) should be int * clean code format * finish res_op_infer for more simple operators * Update operator.py delete some unnecessary print() * Update operator.py clean code * finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems * clean code format * Update warning message output in operator.py * extract common function for general binary and unary operator ,add op bmm's inference * Update ascend_op.py delete unuse param * update DIOPI submodule (#485) * update DIOPI submodule * update submodule * temporily forbid resnet50 * move the testing code to dir under torch_dipu (#465) * move the testing code to dir under torch_dipu * fix a little bug * create two soft link to avoid import torch_dipu too early. * add one more soft link file to solve bugs. * support dev fork ci (#496) * support dev fork ci * [dipu] add markdownlint and update most markdown files (#493) * doc: update docs and add markdownlint * doc: rename readme.md to README.md * fix: remove MD013 * doc: format * [dicp][tops] Support some ops for stable-diffusion. (#467) * Add sin, cos, erf, split. 1. Generalize MakeTuple in tops_op. 2. Generalize make_const in enflame codegen. 3. Add sin, cos, erf, split for tops. 4. Format Python code in dicp tops. * refine code * fix abs test path * clean up code of split. * adjust const op generation. * fix nullptr case in const generation. --------- Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: Reinerzhou <[email protected]> * [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494) * improve maximum schema due to the case in the inference of internlm * fix bug according to comments * fix bug * [both] fix, format and remove spaces in README.md (#497) * doc(readme): fix, format and remove spaces * fix: typo and try auto-correct * feat(ci): add autocorrect into ci * fix: remove autocorrect form ci as it's not ready * update env python 3.10 (#503) * fix clang tidy * [dicp][ascend] get soc_version from aclrt (#505) * fix clang tidy * fix format * fix format --------- Co-authored-by: MiaoYYu <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: Juntao Chen <[email protected]> Co-authored-by: jinminxi104 <[email protected]> Co-authored-by: fandaoyi <[email protected]> Co-authored-by: Peter Ye <[email protected]> Co-authored-by: wiryls <[email protected]> Co-authored-by: yaofengchen <[email protected]> Co-authored-by: Fu Jingguo <[email protected]> Co-authored-by: hellozmz <[email protected]> Co-authored-by: wugeshui <[email protected]> Co-authored-by: CyCle1024 <[email protected]> Co-authored-by: caikun-pjlab <[email protected]> Co-authored-by: tangzhiyi11 <[email protected]> Co-authored-by: wyz5864 <[email protected]> Co-authored-by: Lingjie <[email protected]> Co-authored-by: Joyce YU <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: POI-WX <[email protected]> Co-authored-by: HuayiL <[email protected]> Co-authored-by: Reinerzhou <[email protected]> Co-authored-by: liwenjian-sensetime <[email protected]> Co-authored-by: shanhang <[email protected]>
DeepLink-org · Dec 13, 2023 · f1c2f31 · f1c2f31
1 parent 0bbb2ee
commit f1c2f31
Show file tree

Hide file tree

Showing 11 changed files with 202 additions and 176 deletions.
diff --git a/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUDeviceInfo.cpp b/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUDeviceInfo.cpp
@@ -15,25 +15,29 @@ using c10::DeviceIndex;
 using dipu::devapis::DIPUDeviceProperties;
 using std::shared_ptr;
 
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
 DeviceIndex num_gpus = -1;
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
 c10::once_flag init_flag;
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
 std::deque<c10::once_flag> device_flags;
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
 std::vector<shared_ptr<DIPUDeviceProperties>> device_properties;
 
-static void initDIPUContextVectors() {
+void initDIPUContextVectors() {
   num_gpus = dipu::devproxy::getDeviceCount();
   device_flags.resize(num_gpus);
   device_properties.resize(num_gpus);
 }
 
-static void initDeviceProperty(DeviceIndex device_index) {
+void initDeviceProperty(DeviceIndex device_index) {
   DIPUDeviceProperties device_prop =
       dipu::devproxy::getDeviceProperties(device_index);
   device_properties[device_index] =
       std::make_shared<DIPUDeviceProperties>(device_prop);
 }
 
-static inline void checkDevice(int32_t device_index) {
+inline void checkDevice(int32_t device_index) {
   c10::call_once(init_flag, initDIPUContextVectors);
   if (device_index == -1) {
     device_index = dipu::devproxy::current_device();

diff --git a/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEventPool.cpp b/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEventPool.cpp
@@ -65,7 +65,7 @@ EventPool<deviceEvent_t>* getEventPool() {
   const int index = devproxy::current_device();
 // GlobalEventPool for different cards , construct when really needed
 #define dispatch_event_pool(device_id)                               \
-  if (index == device_id) {                                          \
+  if (index == (device_id)) {                                        \
     static EventPool<deviceEvent_t> gDIPUEventPool(                  \
         [](deviceEvent_t& event) { devapis::createEvent(&event); },  \
         [](deviceEvent_t& event) { devapis::destroyEvent(event); }); \

diff --git a/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.cpp b/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.cpp
@@ -38,19 +38,24 @@ std::ostream& operator<<(std::ostream& stream, StreamIdType s) {
   return stream;
 }
 // follow old pytorch cuda, seems new version use an opposite strategy.
-static constexpr int kStreamsPerPoolBits = 3;
-static constexpr int kStreamsPerPool = 1 << kStreamsPerPoolBits;
+constexpr int kStreamsPerPoolBits = 3;
+constexpr int kStreamsPerPool = 1 << kStreamsPerPoolBits;
 
 // Global stream state and constants
-static c10::DeviceIndex num_dipus = -1;
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
+c10::DeviceIndex num_dipus = -1;
 // Default streams
-static std::once_flag global_init_flag;
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
+std::once_flag global_init_flag;
 
 // streamid contains streamtype and/or raw stream id in DIPUStreamDevice pool
-static thread_local std::unique_ptr<c10::StreamId[]> current_streams = nullptr;
+// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
+thread_local std::unique_ptr<std::vector<c10::StreamId>> current_streams =
+    nullptr;
 
-static c10::StreamId makeC10StreamId(StreamIdType sType, size_t id) {
-  return ((uint32_t) static_cast<c10::StreamId>(sType) << kStreamsPerPoolBits) |
+c10::StreamId makeC10StreamId(StreamIdType sType, size_t id) {
+  return (static_cast<uint32_t>(static_cast<c10::StreamId>(sType)
+                                << kStreamsPerPoolBits)) |
          static_cast<c10::StreamId>(id);
 }
 
@@ -60,25 +65,27 @@ struct DIPUStreamDevice {
   // Default streams
   std::once_flag pool_flag;
   std::once_flag default_flag;
-  deviceId_t devidx_;
+  deviceId_t devidx_{};
   // seems pytorch 2.0 giveup default stream and enable cuda per_thread stream
   // feature at compile time. it cannot be applied to othe device.
   deviceStream_t default_stream = nullptr;
 
-  std::atomic<uint32_t> next_pool_pos;
-  std::array<deviceStream_t, kStreamsPerPool> pool_streams;
+  std::atomic<uint32_t> next_pool_pos{};
+  std::array<deviceStream_t, kStreamsPerPool> pool_streams{};
 
   inline uint32_t getNextPoolIdx() {
     auto raw_idx = next_pool_pos++;
     return raw_idx % kStreamsPerPool;
   }
 
-  inline StreamIdType getStreamIdType(c10::StreamId s) {
-    return static_cast<StreamIdType>((uint32_t)s >> kStreamsPerPoolBits);
+  static StreamIdType getStreamIdType(c10::StreamId s) {
+    return static_cast<StreamIdType>(static_cast<uint32_t>(s) >>
+                                     kStreamsPerPoolBits);
   }
 
-  inline size_t getStreamIdIndex(c10::StreamId s) {
-    return static_cast<size_t>((uint32_t)s & ((1 << kStreamsPerPoolBits) - 1));
+  static size_t getStreamIdIndex(c10::StreamId s) {
+    return static_cast<size_t>(static_cast<uint32_t>(s) &
+                               ((1 << kStreamsPerPoolBits) - 1));
   }
   void _doInitPool() {
     DIPUGuard device_guard{devidx_};
@@ -96,17 +103,15 @@ struct DIPUStreamDevice {
   }
 
  public:
-  DIPUStreamDevice(deviceId_t devidx) {
-    devidx_ = devidx;
-    next_pool_pos = 0;
-  }
+  explicit DIPUStreamDevice(deviceId_t devidx)
+      : next_pool_pos(0), devidx_(devidx) {}
 
   DIPUStream getDIPUStreamfromPool() {
     const auto idx = getNextPoolIdx();
     return DIPUStream(devidx_, makeC10StreamId(StreamIdType::POOL, idx));
   }
 
-  DIPUStream getDefaultDIPUStream() {
+  DIPUStream getDefaultDIPUStream() const {
     return DIPUStream(devidx_, makeC10StreamId(StreamIdType::DEFAULT, 0));
   }
 
@@ -141,10 +146,10 @@ struct DIPUStreamDevice {
   }
 };
 
-static std::array<std::unique_ptr<DIPUStreamDevice>, C10_COMPILE_TIME_MAX_DIPUS>
-    streamDeviceList;
+std::array<std::unique_ptr<DIPUStreamDevice>, C10_COMPILE_TIME_MAX_DIPUS>
+    streamDeviceList;  // NOLINT(cppcoreguidelines-avoid-non-const-global-variables)
 
-static void initGlobalStreamState() {
+void initGlobalStreamState() {
   num_dipus = devproxy::getDeviceCount();
   // Check if the number of DIPU matches the expected compile-time max number
   // of DIPU.
@@ -155,12 +160,11 @@ static void initGlobalStreamState() {
       C10_COMPILE_TIME_MAX_DIPUS, "). Increase that and recompile.");
 
   for (int i = 0; i < num_dipus; i++) {
-    streamDeviceList[i] =
-        std::move(std::unique_ptr<DIPUStreamDevice>(new DIPUStreamDevice(i)));
+    streamDeviceList[i] = std::move(std::make_unique<DIPUStreamDevice>(i));
   }
 }
 
-static c10::DeviceIndex initDIPUGlobal(c10::DeviceIndex devIdx) {
+c10::DeviceIndex initDIPUGlobal(c10::DeviceIndex devIdx) {
   // Inits default streams (once, globally)
   std::call_once(global_init_flag, initGlobalStreamState);
 
@@ -175,11 +179,11 @@ static c10::DeviceIndex initDIPUGlobal(c10::DeviceIndex devIdx) {
   if (current_streams) {
     return devIdx;
   }
-  current_streams = std::make_unique<c10::StreamId[]>(num_dipus);
+  current_streams = std::make_unique<std::vector<c10::StreamId>>(num_dipus);
 
   // Inits current streams (thread local) to default streams
   for (const auto i : c10::irange(num_dipus)) {
-    current_streams[i] = makeC10StreamId(StreamIdType::DEFAULT, 0);
+    (*current_streams)[i] = makeC10StreamId(StreamIdType::DEFAULT, 0);
   }
   // set device default stream in init
   return devIdx;
@@ -193,21 +197,21 @@ deviceStream_t DIPUStream::rawstream() const {
       this->unwrap().id());
 }
 
-DIPUStream getDIPUStreamFromPool(c10::DeviceIndex devIdx) {
-  devIdx = initDIPUGlobal(devIdx);
+DIPUStream getDIPUStreamFromPool(c10::DeviceIndex device_index) {
+  device_index = initDIPUGlobal(device_index);
   // Initializes the stream pools (once)
-  streamDeviceList[devIdx]->initPool();
-  return streamDeviceList[devIdx]->getDIPUStreamfromPool();
+  streamDeviceList[device_index]->initPool();
+  return streamDeviceList[device_index]->getDIPUStreamfromPool();
 }
 
-DIPUStream getDefaultDIPUStream(c10::DeviceIndex devIdx) {
-  devIdx = initDIPUGlobal(devIdx);
-  return streamDeviceList[devIdx]->getDefaultDIPUStream();
+DIPUStream getDefaultDIPUStream(c10::DeviceIndex device_index) {
+  device_index = initDIPUGlobal(device_index);
+  return streamDeviceList[device_index]->getDefaultDIPUStream();
 }
 
-DIPUStream getCurrentDIPUStream(c10::DeviceIndex devIdx) {
-  devIdx = initDIPUGlobal(devIdx);
-  return DIPUStream(devIdx, current_streams[devIdx]);
+DIPUStream getCurrentDIPUStream(c10::DeviceIndex device_index) {
+  device_index = initDIPUGlobal(device_index);
+  return DIPUStream(device_index, (*current_streams)[device_index]);
 }
 
 // copy from pytorch, not verify
@@ -220,11 +224,11 @@ DIPUStream getStreamFromExternal(deviceStream_t ext_stream,
 void setCurrentDIPUStream(DIPUStream stream) {
   auto devIdx = stream.device_index();
   initDIPUGlobal(devIdx);
-  current_streams[devIdx] = stream.unwrap().id();
+  (*current_streams)[devIdx] = stream.unwrap().id();
 }
 
-std::ostream& operator<<(std::ostream& os, const DIPUStream& stream) {
-  return os << stream.unwrap();
+std::ostream& operator<<(std::ostream& stream, const DIPUStream& s) {
+  return stream << s.unwrap();
 }
 
 }  // namespace dipu
diff --git a/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.h b/dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.h
@@ -84,7 +84,7 @@ class DIPU_API DIPUStream {
   c10::Stream stream_;
 };
 
-DIPU_API DIPUStream getDIPUStreamFromPool(c10::DeviceIndex device = -1);
+DIPU_API DIPUStream getDIPUStreamFromPool(c10::DeviceIndex device_index = -1);
 
 DIPU_API DIPUStream getDefaultDIPUStream(c10::DeviceIndex device_index = -1);
 

diff --git a/dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp b/dipu/torch_dipu/csrc_dipu/runtime/core/allocator/DIPUBFCachingAllocator.cpp
@@ -561,7 +561,7 @@ static void deleteBFContext(void* ptr) {
   delete ctx;
 }
 
-DIPU_REGISTER_ALLOCATOR(BF, dipu::DIPU_DEVICE_TYPE, BFCachingAllocator, 0);
-DIPU_REGISTER_ALLOCATOR(BF, at::DeviceType::CPU, BFCachingAllocator, 0);
+DIPU_REGISTER_ALLOCATOR(BF, DIPU_DEVICE_TYPE_MACRO, BFCachingAllocator, 0);
+DIPU_REGISTER_ALLOCATOR(BF, CPU, BFCachingAllocator, 0);
 
 }  // namespace dipu