Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ Model ] Enable Mixed Precision Training #2628

Merged
merged 24 commits into from
Jun 10, 2024

Conversation

jijoongmoon
Copy link
Collaborator

In this PR

This PR modifies codes related to Mixed Precision Training.

Commits to be reviewed in this PR

[ Model ] Fix the gradient clipping for the FP16 or Low bit Gradient

In this PR, when we compute the l2norm of gradient tensor, it converts
to full precsion and computes the l2norm for gradient clipping.

Resolves:

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon [email protected]


[ Layer ] Add mu and var backup up tensor

This PR add the mu and var backup tensor ( mu_b, var_b ) to restore
the previous moving mean and moving variance for mixed precsion
training.

Resolves:

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon [email protected]


[ Layer ] prevent randomize when it restore the data
In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data.

Resolves:

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon [email protected]


[ Context ] add check if it needs restore previous data
This PR enable the check if it need restore previous data. By doing this, we can remove the NaN or Inf data in Tensor for the mixed precsion training.

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon [email protected]


[ Tensor ] remove sscal to set zero.
We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset.

Resolves:

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon [email protected]


jijoongmoon and others added 23 commits May 7, 2024 13:38
We will add Var32 Tensor if the Variable Weight is not Full
precision (FP32). This eables the Weight Update with full precision
and only Apply Gradient Process ueses this Tensor. Therefore, the
lifespan of this tensor should be "ApplyGradient".

. Modify TensorPool to generate Weigth considering Mixed Precsion.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This pr create the variable fp32 tensor when we create the Weight and
Optimizer Weight.

. update the manager to create Weight with  var32 tensor which
requested to weight pool.
. update the weight requests with Weight Spec and var, grad and var32
tensors which created already.
. add clone Tensor with specific type in tensor.h

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR enables the FP16 support for the layers below:

. input layer
. mse loss layer

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR includes the mixed precision test case.

. Input - FC - MSE
 : "batch_size=2", "model_tensor_type=FP16-FP16", "loss_scale=128"

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This commit modify apply gradient in optimizer.
We do not need to save optimizer variables in weight type. Only
Optimizer needs the optimizer variables and we should update the
weight with full precision to maintain the accuracy. Therefore,
remove the var32 tensors for optimizer variables.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add is_NaN function to check if the tensor has NaN value. This
is for the check NaN during mixed precision training.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add loss scale parameter in runcontext and use it to update
mse loss.

. Add Loss Scale Parameter in RunLayerContext Constructor
. Add applyLossScale func to update return derivitive in Loss Layer
. Change MSE Loss Layer to apply the loss scale to return derivitive

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR enables the Mixed Precision Training. For now only FP16-FP32
is considered. Additional Test cases will be added.

. add getSortedLayerIdx to set the graph order for fowarding.
. change clip_weights to lazy_apply_weights to use both cases.
. add fowarding_op to run forwarding from that layer which has a
gradient with nan.
. add while loop for re-run backwarding after reset the loss scale.
. add setLossScale in RunLayerContext
. add check the gradient if mixed precsion enable.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add inifinity value check in Tensor data.
. rename the hasNaN to isValid
. add infinity check in isValid Function and now it check NaN and Inf
. modify to check the blas_avx and blas_neon
. modify graph and model check is_valid rather than has_nan
. add unittest of isValid Function

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR chage the loss computation using full precsion rather than
half precsion to maintain accuracy.

**Changes proposed in this PR:**
- Added TOC generator for README.md

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR enables the Mixed Precsion Unittest with Torch Model.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add torch mixed precsion golden data generation and input and
output for test.

. some fixes to test.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR includes more unittest and fixes for mixed precsion.
. Model Unittest
  . 2 fc layer which generate NaN or Inf Gradient from Troch.
  . MSE Loss and Check whole procedure of the mixed precsion training.
  . Even if the FC model only have one weight, but it is good enough
  to validate the mixed precsion.
  . Torch model also work similar way of NNTrainer.
  . Some fixes about the exeuction order of apply gradient when the
  mixed precision is on.
  . Update SGD to support Mixed Precision training

**Changes proposed in this PR:**
- Added TOC generator for README.md

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR update the conv2D Layer to support Mixed Precision (FP16).
It is based on the PR #2579

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This commit enables mixed precision support for LSTM Layer.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add Execution Mode parameter when we compile. The default is
ml::train::ExeuctionMode::TRAIN. Currently we do not support compiler
optimization for inference mode such as batch normalization fusing,
etc. But we will add more optimization depending on the exeuction
mode.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR includes Mixed Precision support for batch normalization
layer. When the training, BN layer should run full precsion with FP16
Weight data. Therefore, Reading the FP16 data read and data coversion
of the current Weight and Activation are required.

For the Inference, we do need compiler optimization like bn fusing. So
it also includes execution mode parameters for compile.

Because of compilcate data conversion of bn layer, test case
generation also needs to update, so that taking the fp16 input,output
tensors and weights and converting FP32 weight for computation.
For veification, we do need convert FP32 to FP16.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
enable mixed precision on reshape layer
- reshape layer only change dim, so change dimensions and check datatype

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Enable Mixed precision on Pooling 2D Layer
- I modified it to properly cast for the case of FP16 so that the mixed precision function can be activated on the existing pooling 2d layer.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
In this PR, when we compute the l2norm of gradient tensor, it converts
to full precsion and computes the l2norm for gradient clipping.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add the mu and var backup tensor ( mu_b, var_b ) to restore
the previous moving mean and moving variance for mixed precsion
training.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
In order to resotore previous iteration data, this pr disable
randomnization of mask if it need restore previous data.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR enable the check if it need restore previous data. By doing
this, we can remove the NaN or Inf data in Tensor for the mixed
precsion training.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
@taos-ci
Copy link

taos-ci commented Jun 9, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2628. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link

taos-ci commented Jun 9, 2024

:octocat: cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2628-202406092007560.038749933242798-88d84e4429ccd6956521c9e33de600525ccc8aff/.

@jijoongmoon jijoongmoon force-pushed the mixed_precision_training branch from 88d84e4 to 8e06368 Compare June 9, 2024 23:20
@taos-ci
Copy link

taos-ci commented Jun 9, 2024

:octocat: cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2628-202406100820130.94758605957031-8e06368387284d1d3ec5cdb8e272946fe06d2ff8/.

@jijoongmoon jijoongmoon force-pushed the mixed_precision_training branch from 8e06368 to 54bd73d Compare June 10, 2024 00:15
@taos-ci
Copy link

taos-ci commented Jun 10, 2024

:octocat: cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2628-202406100915430.15493392944336-54bd73dbced2c88ca8789840d9151aa7245e3746/.

We do need to remove the Nan or Inf value in Tensor by call setZero().
However, if we using sscal, then Nan or Inf values are remain still.
This PR change the sscal to memset.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
@jijoongmoon jijoongmoon force-pushed the mixed_precision_training branch from 54bd73d to afff553 Compare June 10, 2024 01:08
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.

@@ -64,9 +64,19 @@ warning_c_flags = [
'-Wno-error=varargs'
]

arch = host_machine.cpu_family()

if get_option('enable-avx')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't enable-avx if arch is not x64. We need to emit erros or force-disable avx if it's not x64 in meson script.

You may use tri-state (meson feature) instead of bool for the convinience of avx option.
Currenty, you are enabling avx in Ubuntu-arm/risc-v.

@@ -40,7 +40,7 @@ option('enable-fp16', type: 'boolean', value: false)
option('enable-cublas', type: 'boolean', value: false)
option('enable-openmp', type: 'boolean', value: true)
option('enable-neon', type: 'boolean', value: false)
option('enable-avx', type: 'boolean', value: false)
option('enable-avx', type: 'boolean', value: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those who depend on arch, I recommend "feature" instead of "boolean".

If this is to be kept "boolean" with "true", meson should be able to check the arch and turn avx off if it's not x64.

Copy link
Member

@myungjoo myungjoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this and get the meson-option fixed afterwards.

Copy link
Member

@DonghakPark DonghakPark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@myungjoo myungjoo merged commit 527becc into nnstreamer:main Jun 10, 2024
32 checks passed
@jijoongmoon jijoongmoon deleted the mixed_precision_training branch December 24, 2024 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants