Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensor] Refactorize Tensor Class to TensorV2 @open sesame 03/26 12:26 #2500

Merged
merged 6 commits into from
Jul 26, 2024

Conversation

djeong20
Copy link
Contributor

@djeong20 djeong20 commented Mar 8, 2024

This pull request introduces a major refactorization of the Tensor class in our codebase by introducing the new TensorV2 class.
The previous Tensor class is removed, and all instances where it was used have been updated accordingly.
With this change, the overall functionality and stability of our system are expected to be increased.

Self-evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

@taos-ci
Copy link

taos-ci commented Mar 8, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2500. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link

taos-ci commented Mar 8, 2024

:octocat: cibot: @djeong20, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2500-202403081422460.15307211875916-b86dc2ac092b931ae8f75ce2bb55d4d3439f18ad/.

@djeong20 djeong20 changed the title [Wait for #2489,#2495,#2497,#2498][Tensor] Refactorize Tensor Class to TensorV2 [Wait for #2489,#2495,#2497,#2498][Tensor] Refactorize Tensor Class to TensorV2 @open sesame 03/08 16:34 Mar 8, 2024
@taos-ci
Copy link

taos-ci commented Mar 8, 2024

:octocat: cibot: @djeong20, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2500-202403081634500.4846498966217-b86dc2ac092b931ae8f75ce2bb55d4d3439f18ad/.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

@djeong20 djeong20 changed the title [Wait for #2489,#2495,#2497,#2498][Tensor] Refactorize Tensor Class to TensorV2 @open sesame 03/08 16:34 [Tensor] Refactorize Tensor Class to TensorV2 Mar 15, 2024
@djeong20 djeong20 changed the title [Tensor] Refactorize Tensor Class to TensorV2 [Tensor] Refactorize Tensor Class to TensorV2 @open sesame 03/26 12:26 Mar 26, 2024
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@SeoHyungjun SeoHyungjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just need to resolve the conflict!👍

if (m.size() > this->size())
throw exception::not_supported("broadcasting *this is not supported");

const TensorDim m_dim = m.getDim();

BroadcastInfoV2 e;
BroadcastInfo e;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why BroadcastInfo's variable name is e?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular reason! reused the previous name.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@DonghakPark DonghakPark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Tdatatype dtype = tensors.front().getDim().getDataType();

if (dtype == Tdatatype::FP32) {
output = FloatTensor::cat(tensors, axis);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this if branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. As far as I know, this function concatenates a vector of tensors by a given axis.
However, the problem with this function is tensors are concatenated to the first tensor of the list, not the tensor that calls this function. Therefore, the FP16 tensor can concatenate a list of FP32 tensors.
While the input tensors have no relation with the tensor that calls this function, we need a decision point on whether to concatenate a list of FP32 tensors or FP16 tensors, and this should be made in the tensor class.

I think there is plenty of room to improve the function itself and we can later newly design this function.

Copy link
Collaborator

@jijoongmoon jijoongmoon Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can change the api.. like tensors[0]->itensor.cat(tensors, axis)?
Cause output data type is going to be type of tensor[0] anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot use it as tensors[0]->itensor.cat(tensors, axis) since itensor is a private variable. I'll modify this in another way and let you know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this by creating a new member function of Tensor concat(), and the current Tensor::cat(), which is a static function, uses concat(). This should work exactly the same as how tensors[0]->itensor.cat(tensors, axis) would operate.

itensor->copy(from);
} else {
// replace with a new tensor that are the same with the given tensor
if (from.getDataType() == ml::train::TensorDim::DataType::FP32) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this if branch by calling itensor.copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this code intentionally in the tensor class for the following reasons.

  1. This condition checks the data type of the Tensor to copy. Therefore, even if this code is moved into Float/HalfTensor, we still need to check the source Tensor's data type, which eventually has the same code in both classes.
  2. Current implementation uses swap() to perform copy. However, this won't be a valid option when it happens in the Float/HalfTensor class. This is because swap(t, *this) would be swap(Tensor &lhs, FloatTensor &rhs) or swap(Tensor &lhs, HalfTensor &rhs) instead of swap(Tensor &lhs, Tensor &rhs). While Tensor has FloatTensor as a member variable, I believe implementing a swap between Tensor and FloatTensor would require structural change.

Hope this clarifies and let me know if you have further questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you have any suggestions, please bring them up!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using template? Cause if we need to add more tensor type like int8, bf16, then there will be more branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we can just remove this branch and create Tensor T using copy constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

for (unsigned int c = 0; c < t.channel(); ++c) {
for (unsigned int h = 0; h < t.height(); ++h) {
for (unsigned int w = 0; w < t.width(); ++w) {
if (getDataType() == ml::train::TensorDim::DataType::FP32) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as copy().

Copy link
Collaborator

@jijoongmoon jijoongmoon Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is minor, how about moving the if part to outside of the loop?

And It seems little be strange.
getDataType() returns my data type.. and it apply when we get the value of from.
Should it be from.getDataType()??
And if it has basic assumption saying both tensor types are same, then do not need to consider the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is resolved now. thanks for pointing it out!

void deallocate() {
data = nullptr;
offset = 0;
template <typename T = float> T *getData() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eventually we should remove this getData() and getAddress() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally. we should first remove getData() and getAddress() usage in layers (and other places), then deprecate both functions.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

Comment on lines +379 to +429
* @brief Copy the Tensor
* @param[in] input Tensor to be copied
* @param[out] output output Tensor
*/
void copy_with_stride(const Tensor &input, Tensor &output) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the stride stand for? (implicit stride is applied only when data types are heterogeneous) Is it used to copy tensor with different type only? Could you add more explanation on the comment? If this copy is used only when tensor types are different, could you think about changing the function name? I feel it sounds to support strided copy from any tensor with any stride options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_with_stride() is not used to copy tensors with different data types. The function assumes it only takes the same data type. copyData() should be the function for copying different data types.
I'm not sure what's the initial purpose for this function since there's not much description in #1300, but it is currently used for copying uncontiguous tensors, which in most cases copies shared data tensors. We should discuss whether to keep the name or change it to a better one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm working on a new PR for adding descriptions and unifying/changing naming. Please let me know if other functions seem unclear or need more explanations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for clarifying my misunderstanding!

djeong20 added 6 commits July 22, 2024 16:39
This commit deprecates the existing TensorV2 class and replaces Tensor class with the new TensorV2 class.
The previous Tensor class has been removed and all its usages have been updated to use the TensorV2 class.
Additionally, all instances of TensorV2 usage within the NNTrainer have been removed.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
This commit aims to fix several issues that arose due to the refactoring of the Tensor class.

**Changes proposed in this PR:**
- The copy constructor has been implemented to prevent incorrect behavior of the default copy constructor in this commit
- Tensor add_i() has been newly implemented to fix previous incorrect implementations.
- Add chain() function that returns LazyTensor

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
This commit updates recently added features in tensor, including add_i_partial() and ele_mul().
The newly added functions have been implemented according to the revised tensor structure.

**Changes proposed in this PR:**
- Update Float/HalfTensor class with newly added function, add_i_partial().
- Apply BLAS operations in basic arithmetic operations in Tensor.
- height-width transpose in half-precision can be SIMD accelerated.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
This commit moves several operations implementations to each Tensor class for easier management.
This allows users to create a new data type Tensor without unnecessary modification to the Tensor class.

**Changes proposed in this PR:**
- static function Tensor::cat() uses each tensor's member function concat().
- Tensor::copy() logic is simplified by not differentiating by its data type.
- Tensor::copy_with_stride() uses an internal function to operate.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
This PR fixes issues of undefined symbols of one of the tensor constructors.
The function implementation is moved to the header file to resolve this issue.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
This PR resolves warnings that occur during the Android build. The list is as follows.

**Changes proposed in this PR:**
- Fix function that overrides virtual functions but is not marked override.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for great work!

@jijoongmoon jijoongmoon merged commit 5186d4d into nnstreamer:main Jul 26, 2024
38 checks passed
@djeong20 djeong20 deleted the refactor/tensor branch July 26, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants