Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Moving Addition kernel to Tensor Directory @open sesame 07/04 09:01 #2666

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 3 additions & 39 deletions nntrainer/layers/cl_layers/addition_layer_cl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
*/

#include <addition_layer_cl.h>
#include <blas_kernels.h>
#include <blas_kernel_interface.h>
#include <nntrainer_error.h>
#include <nntrainer_log.h>
#include <node_exporter.h>
Expand All @@ -37,47 +37,11 @@ void AdditionLayerCL::forwarding(RunLayerContext &context, bool training) {
if (!idx) {
hidden_.copy(input_);
} else {
AddProcess(input_, hidden_, context);
add_i_cl(input_, hidden_, context);
}
}
}

void AdditionLayerCL::AddProcess(Tensor const &input, Tensor &result,
RunLayerContext &context) {

CREATE_IF_EMPTY_DIMS(result, result.getDim());

NNTR_THROW_IF(result.getData() == nullptr, std::invalid_argument)
<< result.getName() << " is not allocated";
NNTR_THROW_IF(input.getData() == nullptr, std::invalid_argument)
<< input.getName() << " is not allocated";

if (input.getDim() != result.getDim()) {
throw std::invalid_argument(
"Error: Dimensions does not match for addition");
}

if (input.getDataType() == ml::train::TensorDim::DataType::FP32) {
unsigned int size = input.size();
const float *data = input.getData();
float *rdata = result.getData();

addition_cl(data, rdata, size, context);

} else if (input.getDataType() == ml::train::TensorDim::DataType::FP16) {
#ifdef ENABLE_FP16
unsigned int size = input.size();
const _FP16 *data = input.getData<_FP16>();
_FP16 *rdata = result.getData<_FP16>();

addition_cl(data, rdata, size, context);

#else
throw std::invalid_argument("Error: enable-fp16 is not enabled");
#endif
}
}

void AdditionLayerCL::incremental_forwarding(RunLayerContext &context,
unsigned int from, unsigned int to,
bool training) {
Expand Down Expand Up @@ -113,7 +77,7 @@ void AdditionLayerCL::incremental_forwarding(RunLayerContext &context,
if (!idx) {
hidden_step.copy(input_step);
} else {
AddProcess(input_step, hidden_step, context);
add_i_cl(input_step, hidden_step, context);
}
}
}
Expand Down
9 changes: 0 additions & 9 deletions nntrainer/layers/cl_layers/addition_layer_cl.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,15 +76,6 @@ class AdditionLayerCL : public Layer {
*/
void calcDerivative(RunLayerContext &context) override;

/**
* @brief Process data and dimensions for add operation used in addition layer
* @param[in] input Tensor
* @param[in] result Tensor
* @param[in] RunLayerContext reference
*/
void AddProcess(Tensor const &input, Tensor &result,
RunLayerContext &context);

/**
* @copydoc bool supportBackwarding() const
*/
Expand Down
35 changes: 35 additions & 0 deletions nntrainer/tensor/cl_operations/blas_kernel_interface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -211,4 +211,39 @@ void multiplyCl(Tensor &input, float const &value, RunLayerContext &context) {
}
}

void add_i_cl(Tensor const &input, Tensor &result, RunLayerContext &context) {

CREATE_IF_EMPTY_DIMS(result, result.getDim());

NNTR_THROW_IF(result.getData() == nullptr, std::invalid_argument)
<< result.getName() << " is not allocated";
NNTR_THROW_IF(input.getData() == nullptr, std::invalid_argument)
<< input.getName() << " is not allocated";

if (input.getDim() != result.getDim()) {
throw std::invalid_argument(
"Error: Dimensions does not match for addition");
}

if (input.getDataType() == ml::train::TensorDim::DataType::FP32) {
unsigned int size = input.size();
const float *data = input.getData();
float *rdata = result.getData();

addition_cl(data, rdata, size, context);

} else if (input.getDataType() == ml::train::TensorDim::DataType::FP16) {
#ifdef ENABLE_FP16
unsigned int size = input.size();
const _FP16 *data = input.getData<_FP16>();
_FP16 *rdata = result.getData<_FP16>();

addition_cl(data, rdata, size, context);

#else
throw std::invalid_argument("Error: enable-fp16 is not enabled");
#endif
}
}

} // namespace nntrainer
8 changes: 8 additions & 0 deletions nntrainer/tensor/cl_operations/blas_kernel_interface.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,13 @@ void dotBatchedCl(Tensor const &input, Tensor const &m, Tensor &result,
*/
void multiplyCl(Tensor &input, float const &value, RunLayerContext &context);

/**
* @brief Process data and dimensions for add operation
* @param[in] input Tensor
* @param[in] result Tensor
* @param[in] RunLayerContext reference
*/
void add_i_cl(Tensor const &input, Tensor &result, RunLayerContext &context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a big deal but how about renaming it as add_cl() for clarity since Tensor uses _i as in-place operations. (which in this case takes an output tensor)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+) What about naming it as addCl to make it consistent with other kernel operations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was following the cpu naming convention, for cpu implementation it is named as add_i. And for the GPU kernel as well I am doing the in-place operation. So inorder to add input[0] and input[1], I am storing the result as input[0] += input[1], not creating any other output tensor additionally and treating input[0] as output tensor. Followed the CPU implementation only.

But still if I need to change the name to addCL, Please tell and I'll do it accordingly.


} // namespace nntrainer
#endif /* __BLAS_KERNEL_INTERFACE_H__ */
2 changes: 1 addition & 1 deletion test/unittest/layers/unittest_layers_addition_cl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,6 @@ auto addition_w16a16_gpu = LayerGoldenTestParamType(
"added_w16a16.nnlayergolden", LayerGoldenTestParamOptions::DEFAULT, "nchw",
"fp16", "fp16");

GTEST_PARAMETER_TEST(Addition16, LayerGoldenTest,
GTEST_PARAMETER_TEST(AdditionGPU16, LayerGoldenTest,
::testing::Values(addition_w16a16_gpu));
#endif
Loading