Skip to content

Commit

Permalink
[DIPU]clang-tidy_shanhang (#516)
Browse files Browse the repository at this point in the history
* Create main readme

* Update readme.md

* Update readme.md

* Update readme.md

* add clone kineto for dicp (#457)

add clone kineto for dicp

* [dicp][ascend] infer op result_info (#448)

* finish res_op_infer for softmax+log_softmax+add+amax(keepdim=True) pass static test

* repeal modification to diopi

* modify operator logic in /DIPU/dicp/dicp/dynamo_bridge/operator.py to support test of'infer_result'

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* fix gettupleelem in topsgraph

---------

Co-authored-by: jinminxi104 <[email protected]>

* Fdy/enhance copy (#430)

* mv vopy file path

* add new copy

* fix static param err

* fix copy err

* fix direct copy bug

* rm unused bcast template name

* change clang format

* change name hpp

* rm unused header file

* remove unused header 2

* change override behavior

* change comment

* change cudacopy

* fix d2d copy err

* change register to use autogen

* revert incorrect format

* config fallback

* fix link err

* fix comment wanglei

* add newline

* fix cpu copy err

* add camb vendor copy

* fix copy err

* fix copy err 2

* fix compile err

* fix lingjie comment1

* fix caikun comment

* fix camb ci

* fix camb ci

* fix device switch err

* fix ling jie caikun comment 2

* fix comment incorrect local  ref

* change init copy

* update DIOPI submodule (#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (#428)

* [dipu] Fix copy_ fallback of topsrider. (#477)

* [dicp][tops] Add dicp ci of tops. (#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (#474)

* Fdy/fix copy tidy (#471)

* fix tidy 0

* fix clang tidy copy

* fix lingjie comment

* add tidy msg

* fix lint comment

* fix format

* add copy right

* fuj/ add ceil.out (#480)

* add ceil.out

* add floor_ and cases for floor_, ceil and ceil_

* [dipu] tidy some source files and update nv build script (#453)

* fix: tidy some source files
- and also update build nv script

* fix: make clang-format v16 happy

* fix: make clang-format v16 happy

* fix: remove usings and simplify some code

* fix: remove index

* fix: remove initialized_

* fix: add keyword VERSION

* fix: remove VERSION 3.25 as CI is using CMake 3.22

* add 910B CI && remove 910 CI && update DIOPI (#481)

* add 910b

* add 910b

* add 910b

* add 910b

* add resnet50

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* rm nouse code

* update DIOPI submodule (#458)

* update DIOPI submodule

* diopi update to main

* update mmcv version

* update submodule

* update mmcv commit id

* feat: pass CMAKE_BUILD_TYPE into DIOPI (#428)

* [dipu] Fix copy_ fallback of topsrider. (#477)

* [dicp][tops] Add dicp ci of tops. (#469)

* Add dicp ci of tops.

* Fix dicp ci of tops.

* fix recycle dep (#474)

* rm 910 ci

* update diopi

* rm 910

---------

Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: wugeshui <[email protected]>

* [dipu]add ascend profiler (#476)

* add ascend profiler

* support with_stack

* code format

* fix clang tidy

* optimize naming

* optimize naming

* add dipu ci on dicp (#488)

* [dicp][ascend] fix ascend mm/bmm on 910B (#482)

* mock torch.cuda.XXXTensor (#462)

* mock torch.cuda.XXXTensor

* add newline at end of file

* fix conflict

* fix format

* fix format

* fix comment

* Fix `multiprocessing.Process` tests not collected by coverage and gcov (#486)

* Fix `multiprocessing.Process` tests not collected by coverage and gcov

* fix --concurrency=multiprocessing

* [dipu] update tidy configuration and remove if-constexpr in C++14 (#470)

* fix: update tidy config and remove if-constexpr

* fix: it should be a list instead of bool value

* feat: update clangd config

* fix: move the comment out of yaml scalar

* docs: add comments

* fix: add DeviceIndex

* fix: add some checks for headers

* feat: update .clang-tidy

* add profiler readme (#489)

* add profiler readme

* Update readme.md

* update

* Update readme.md

* Update readme.md

* Update readme.md

---------

Co-authored-by: caikun-pjlab <[email protected]>

* [dicp][tops] support outputs with inplace copy (#440)

* add dipu stream synchronize.

* adjust some ops.

* fix some paras error and rename device name.

* unset keep_inference_input_mutations.

* fix paras error in conversion.

* fix para dtype conversion.

* fix empty output and inplace copy of input paras in optimizer case.

* remove inplace output gen_empty_tensor.

* Ywt/fix autocompare compile error (#492)

* pass string to python

* disable _amp_foreach_non_finite_check_and_unscale_ autocompare

* [dipu] Wx/support the test for llm inference (#454)

* add one iter for llm

* add bert ci using the correct transformers repository

* add test for the inference of llama 7b using the transformers repository

* one iter test for traditional models by default

* fix bug

* add test for the inference of internlm 7b using the transformers repository

* test for torch_dipu

* set device check args other for maximum.out

* fix the partition arg parsing bug on cuda

* test the setting of CUDA_PARTITION

* fix the bug of setting CUDA_PARTATION

* add llm

* add llm

* optimize the selection of model list

* set pythonpath for torch_dipu

* test

* fix bug in the command of setting pythonpath

---------

Co-authored-by: wugeshui <[email protected]>

* [DIPU]Wx/check the status of build dipu (#490)

* check the status of build dipu on camb and nv

* add check for ascend

* fix the bug of pipe

* [DIPU] Wx/add schema for logical or and logical not ops (#484)

* add schema for logical or and logical not ops

* fix bug and add test cases for these ops

* add the test case: out is empty tensor

* [dicp][ascend] infer op resinfo (part 2) (#491)

* fix a bug in get_cast_dtype: type(int+bool) should be int

* clean code format

* finish res_op_infer for more simple operators

* Update operator.py

delete some unnecessary print()

* Update operator.py

clean code

* finish operators' info inference except for those having trouble testing solely without inference and operators involving Reshape still have problems

* clean code format

* Update warning message output in operator.py

* extract common function for general binary and unary operator ,add op bmm's inference

* Update ascend_op.py

delete unuse param

* update DIOPI submodule (#485)

* update DIOPI submodule

* update submodule

* temporily forbid resnet50

* move the testing code to dir under torch_dipu (#465)

* move the testing code to dir under torch_dipu

* fix a little bug

* create two soft link to avoid import torch_dipu  too early.

* add one more soft link file to solve bugs.

* support dev fork ci (#496)

* support dev fork ci

* [dipu] add markdownlint and update most markdown files (#493)

* doc: update docs and add markdownlint

* doc: rename readme.md to README.md

* fix: remove MD013

* doc: format

* [dicp][tops] Support some ops for stable-diffusion. (#467)

* Add sin, cos, erf, split.

1. Generalize MakeTuple in tops_op.
2. Generalize make_const in enflame codegen.
3. Add sin, cos, erf, split for tops.
4. Format Python code in dicp tops.

* refine code

* fix abs test path

* clean up code of split.

* adjust const op generation.

* fix nullptr case in const generation.

---------

Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>

* [DIPU] Wx/modify maximum schema due to the case in the inference of internlm (#494)

* improve maximum schema due to the case in the inference of internlm

* fix bug according to comments

* fix bug

* [both] fix, format and remove spaces in README.md (#497)

* doc(readme): fix, format and remove spaces

* fix: typo and try auto-correct

* feat(ci): add autocorrect into ci

* fix: remove autocorrect form ci as it's not ready

* update env python 3.10 (#503)

* fix clang tidy

* [dicp][ascend] get soc_version from aclrt (#505)

* fix clang tidy

* fix format

* fix format

---------

Co-authored-by: MiaoYYu <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: Juntao Chen <[email protected]>
Co-authored-by: jinminxi104 <[email protected]>
Co-authored-by: fandaoyi <[email protected]>
Co-authored-by: Peter Ye <[email protected]>
Co-authored-by: wiryls <[email protected]>
Co-authored-by: yaofengchen <[email protected]>
Co-authored-by: Fu Jingguo <[email protected]>
Co-authored-by: hellozmz <[email protected]>
Co-authored-by: wugeshui <[email protected]>
Co-authored-by: CyCle1024 <[email protected]>
Co-authored-by: caikun-pjlab <[email protected]>
Co-authored-by: tangzhiyi11 <[email protected]>
Co-authored-by: wyz5864 <[email protected]>
Co-authored-by: Lingjie <[email protected]>
Co-authored-by: Joyce YU <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: POI-WX <[email protected]>
Co-authored-by: HuayiL <[email protected]>
Co-authored-by: Reinerzhou <[email protected]>
Co-authored-by: liwenjian-sensetime <[email protected]>
Co-authored-by: shanhang <[email protected]>
  • Loading branch information
1 parent 0bbb2ee commit f1c2f31
Show file tree
Hide file tree
Showing 11 changed files with 202 additions and 176 deletions.
10 changes: 7 additions & 3 deletions dipu/torch_dipu/csrc_dipu/runtime/core/DIPUDeviceInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,29 @@ using c10::DeviceIndex;
using dipu::devapis::DIPUDeviceProperties;
using std::shared_ptr;

// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
DeviceIndex num_gpus = -1;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
c10::once_flag init_flag;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
std::deque<c10::once_flag> device_flags;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
std::vector<shared_ptr<DIPUDeviceProperties>> device_properties;

static void initDIPUContextVectors() {
void initDIPUContextVectors() {
num_gpus = dipu::devproxy::getDeviceCount();
device_flags.resize(num_gpus);
device_properties.resize(num_gpus);
}

static void initDeviceProperty(DeviceIndex device_index) {
void initDeviceProperty(DeviceIndex device_index) {
DIPUDeviceProperties device_prop =
dipu::devproxy::getDeviceProperties(device_index);
device_properties[device_index] =
std::make_shared<DIPUDeviceProperties>(device_prop);
}

static inline void checkDevice(int32_t device_index) {
inline void checkDevice(int32_t device_index) {
c10::call_once(init_flag, initDIPUContextVectors);
if (device_index == -1) {
device_index = dipu::devproxy::current_device();
Expand Down
2 changes: 1 addition & 1 deletion dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEventPool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ EventPool<deviceEvent_t>* getEventPool() {
const int index = devproxy::current_device();
// GlobalEventPool for different cards , construct when really needed
#define dispatch_event_pool(device_id) \
if (index == device_id) { \
if (index == (device_id)) { \
static EventPool<deviceEvent_t> gDIPUEventPool( \
[](deviceEvent_t& event) { devapis::createEvent(&event); }, \
[](deviceEvent_t& event) { devapis::destroyEvent(event); }); \
Expand Down
84 changes: 44 additions & 40 deletions dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,19 +38,24 @@ std::ostream& operator<<(std::ostream& stream, StreamIdType s) {
return stream;
}
// follow old pytorch cuda, seems new version use an opposite strategy.
static constexpr int kStreamsPerPoolBits = 3;
static constexpr int kStreamsPerPool = 1 << kStreamsPerPoolBits;
constexpr int kStreamsPerPoolBits = 3;
constexpr int kStreamsPerPool = 1 << kStreamsPerPoolBits;

// Global stream state and constants
static c10::DeviceIndex num_dipus = -1;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
c10::DeviceIndex num_dipus = -1;
// Default streams
static std::once_flag global_init_flag;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
std::once_flag global_init_flag;

// streamid contains streamtype and/or raw stream id in DIPUStreamDevice pool
static thread_local std::unique_ptr<c10::StreamId[]> current_streams = nullptr;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
thread_local std::unique_ptr<std::vector<c10::StreamId>> current_streams =
nullptr;

static c10::StreamId makeC10StreamId(StreamIdType sType, size_t id) {
return ((uint32_t) static_cast<c10::StreamId>(sType) << kStreamsPerPoolBits) |
c10::StreamId makeC10StreamId(StreamIdType sType, size_t id) {
return (static_cast<uint32_t>(static_cast<c10::StreamId>(sType)
<< kStreamsPerPoolBits)) |
static_cast<c10::StreamId>(id);
}

Expand All @@ -60,25 +65,27 @@ struct DIPUStreamDevice {
// Default streams
std::once_flag pool_flag;
std::once_flag default_flag;
deviceId_t devidx_;
deviceId_t devidx_{};
// seems pytorch 2.0 giveup default stream and enable cuda per_thread stream
// feature at compile time. it cannot be applied to othe device.
deviceStream_t default_stream = nullptr;

std::atomic<uint32_t> next_pool_pos;
std::array<deviceStream_t, kStreamsPerPool> pool_streams;
std::atomic<uint32_t> next_pool_pos{};
std::array<deviceStream_t, kStreamsPerPool> pool_streams{};

inline uint32_t getNextPoolIdx() {
auto raw_idx = next_pool_pos++;
return raw_idx % kStreamsPerPool;
}

inline StreamIdType getStreamIdType(c10::StreamId s) {
return static_cast<StreamIdType>((uint32_t)s >> kStreamsPerPoolBits);
static StreamIdType getStreamIdType(c10::StreamId s) {
return static_cast<StreamIdType>(static_cast<uint32_t>(s) >>
kStreamsPerPoolBits);
}

inline size_t getStreamIdIndex(c10::StreamId s) {
return static_cast<size_t>((uint32_t)s & ((1 << kStreamsPerPoolBits) - 1));
static size_t getStreamIdIndex(c10::StreamId s) {
return static_cast<size_t>(static_cast<uint32_t>(s) &
((1 << kStreamsPerPoolBits) - 1));
}
void _doInitPool() {
DIPUGuard device_guard{devidx_};
Expand All @@ -96,17 +103,15 @@ struct DIPUStreamDevice {
}

public:
DIPUStreamDevice(deviceId_t devidx) {
devidx_ = devidx;
next_pool_pos = 0;
}
explicit DIPUStreamDevice(deviceId_t devidx)
: next_pool_pos(0), devidx_(devidx) {}

DIPUStream getDIPUStreamfromPool() {
const auto idx = getNextPoolIdx();
return DIPUStream(devidx_, makeC10StreamId(StreamIdType::POOL, idx));
}

DIPUStream getDefaultDIPUStream() {
DIPUStream getDefaultDIPUStream() const {
return DIPUStream(devidx_, makeC10StreamId(StreamIdType::DEFAULT, 0));
}

Expand Down Expand Up @@ -141,10 +146,10 @@ struct DIPUStreamDevice {
}
};

static std::array<std::unique_ptr<DIPUStreamDevice>, C10_COMPILE_TIME_MAX_DIPUS>
streamDeviceList;
std::array<std::unique_ptr<DIPUStreamDevice>, C10_COMPILE_TIME_MAX_DIPUS>
streamDeviceList; // NOLINT(cppcoreguidelines-avoid-non-const-global-variables)

static void initGlobalStreamState() {
void initGlobalStreamState() {
num_dipus = devproxy::getDeviceCount();
// Check if the number of DIPU matches the expected compile-time max number
// of DIPU.
Expand All @@ -155,12 +160,11 @@ static void initGlobalStreamState() {
C10_COMPILE_TIME_MAX_DIPUS, "). Increase that and recompile.");

for (int i = 0; i < num_dipus; i++) {
streamDeviceList[i] =
std::move(std::unique_ptr<DIPUStreamDevice>(new DIPUStreamDevice(i)));
streamDeviceList[i] = std::move(std::make_unique<DIPUStreamDevice>(i));
}
}

static c10::DeviceIndex initDIPUGlobal(c10::DeviceIndex devIdx) {
c10::DeviceIndex initDIPUGlobal(c10::DeviceIndex devIdx) {
// Inits default streams (once, globally)
std::call_once(global_init_flag, initGlobalStreamState);

Expand All @@ -175,11 +179,11 @@ static c10::DeviceIndex initDIPUGlobal(c10::DeviceIndex devIdx) {
if (current_streams) {
return devIdx;
}
current_streams = std::make_unique<c10::StreamId[]>(num_dipus);
current_streams = std::make_unique<std::vector<c10::StreamId>>(num_dipus);

// Inits current streams (thread local) to default streams
for (const auto i : c10::irange(num_dipus)) {
current_streams[i] = makeC10StreamId(StreamIdType::DEFAULT, 0);
(*current_streams)[i] = makeC10StreamId(StreamIdType::DEFAULT, 0);
}
// set device default stream in init
return devIdx;
Expand All @@ -193,21 +197,21 @@ deviceStream_t DIPUStream::rawstream() const {
this->unwrap().id());
}

DIPUStream getDIPUStreamFromPool(c10::DeviceIndex devIdx) {
devIdx = initDIPUGlobal(devIdx);
DIPUStream getDIPUStreamFromPool(c10::DeviceIndex device_index) {
device_index = initDIPUGlobal(device_index);
// Initializes the stream pools (once)
streamDeviceList[devIdx]->initPool();
return streamDeviceList[devIdx]->getDIPUStreamfromPool();
streamDeviceList[device_index]->initPool();
return streamDeviceList[device_index]->getDIPUStreamfromPool();
}

DIPUStream getDefaultDIPUStream(c10::DeviceIndex devIdx) {
devIdx = initDIPUGlobal(devIdx);
return streamDeviceList[devIdx]->getDefaultDIPUStream();
DIPUStream getDefaultDIPUStream(c10::DeviceIndex device_index) {
device_index = initDIPUGlobal(device_index);
return streamDeviceList[device_index]->getDefaultDIPUStream();
}

DIPUStream getCurrentDIPUStream(c10::DeviceIndex devIdx) {
devIdx = initDIPUGlobal(devIdx);
return DIPUStream(devIdx, current_streams[devIdx]);
DIPUStream getCurrentDIPUStream(c10::DeviceIndex device_index) {
device_index = initDIPUGlobal(device_index);
return DIPUStream(device_index, (*current_streams)[device_index]);
}

// copy from pytorch, not verify
Expand All @@ -220,11 +224,11 @@ DIPUStream getStreamFromExternal(deviceStream_t ext_stream,
void setCurrentDIPUStream(DIPUStream stream) {
auto devIdx = stream.device_index();
initDIPUGlobal(devIdx);
current_streams[devIdx] = stream.unwrap().id();
(*current_streams)[devIdx] = stream.unwrap().id();
}

std::ostream& operator<<(std::ostream& os, const DIPUStream& stream) {
return os << stream.unwrap();
std::ostream& operator<<(std::ostream& stream, const DIPUStream& s) {
return stream << s.unwrap();
}

} // namespace dipu
2 changes: 1 addition & 1 deletion dipu/torch_dipu/csrc_dipu/runtime/core/DIPUStream.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ class DIPU_API DIPUStream {
c10::Stream stream_;
};

DIPU_API DIPUStream getDIPUStreamFromPool(c10::DeviceIndex device = -1);
DIPU_API DIPUStream getDIPUStreamFromPool(c10::DeviceIndex device_index = -1);

DIPU_API DIPUStream getDefaultDIPUStream(c10::DeviceIndex device_index = -1);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -561,7 +561,7 @@ static void deleteBFContext(void* ptr) {
delete ctx;
}

DIPU_REGISTER_ALLOCATOR(BF, dipu::DIPU_DEVICE_TYPE, BFCachingAllocator, 0);
DIPU_REGISTER_ALLOCATOR(BF, at::DeviceType::CPU, BFCachingAllocator, 0);
DIPU_REGISTER_ALLOCATOR(BF, DIPU_DEVICE_TYPE_MACRO, BFCachingAllocator, 0);
DIPU_REGISTER_ALLOCATOR(BF, CPU, BFCachingAllocator, 0);

} // namespace dipu
Loading

0 comments on commit f1c2f31

Please sign in to comment.