-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] group op #1202
base: chao/xccl
Are you sure you want to change the base?
[wip] group op #1202
Conversation
}, | ||
[](at::xpu::XPUStream&, | ||
c10::intrusive_ptr<ProcessGroupXCCL::WorkXCCL>&) { | ||
ccl::group_end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think groupStart
/groupEnd
wraps ccl::group_start
/ccl::group_end
. Then should you call the wrapped API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xcclActiveGroupCounter_ affect batchP2P choice. lets use origin api like nccl
src/xccl/ProcessGroupXCCL.cpp
Outdated
return true; | ||
} | ||
|
||
void check_xpu_single_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you follow the same naming format like checkSingleTensor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified
src/xccl/ProcessGroupXCCL.cpp
Outdated
} | ||
} | ||
} | ||
|
||
int64_t check_xpu_tensors_same_device(const std::vector<at::Tensor>& tensors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you follow the same naming format like checkTensorOnSameDevice
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified
@@ -62,6 +109,10 @@ ccl::reduction getXcclReduceOp(const ReduceOp& reduceOp, at::Tensor& input) { | |||
// Map sum to max for bool tensors to avoid overflow issues with sum. | |||
return ccl::reduction::max; | |||
} | |||
// WA due to oneCCL not support AVG | |||
if (reduceOp == ReduceOp::AVG) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WA does not mean simply replacing avg
with sum
, but using sum
collective and div
SYCL kernel to simulate avg
. Please update your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add comments that oneCCL is expected to support avg in basekit 2025.2 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/xccl/ProcessGroupXCCL.cpp
Outdated
@@ -31,22 +31,69 @@ const std::map<at::ScalarType, ccl::datatype> xcclDatatypes = { | |||
{at::kFloat8_e5m2fnuz, ccl::datatype::uint8}, | |||
}; | |||
|
|||
void checkXPUTensor(at::Tensor& tensor) { | |||
bool check_same_size(const std::vector<at::Tensor>& input_tensors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refine the API name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
No description provided.