Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

e-ddykim
Copy link
Contributor

@e-ddykim e-ddykim commented Oct 28, 2024

Details:

  • When a model runs at inference precision of f16, it might be unable to calculate correct results due to limited range of f16.
  • The purpose of this PR is to avoid situations where overflow occurs during calculation by scaling down the activation, thereby obtaining correct results when the infer precision is f16.
  • A new config property "ACTIVATIONS_SCALE_FACTOR" is introduced, which holds a single floating-point value. For example, if it is 64, activations are divided by 64 before Convolution and MatMul. If it is smaller than 0, this feature is disabled.
    • This property also can be set via rt_info of a model as below.
    <rt_info>
        <runtime_options>
            <ACTIVATIONS_SCALE_FACTOR value="8.0" />
        </runtime_options>
    </rt_info>

Tickets:

  • 147052

@e-ddykim e-ddykim requested review from a team as code owners October 28, 2024 02:20
@e-ddykim e-ddykim requested review from itikhono and removed request for a team October 28, 2024 02:20
@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: GPU OpenVINO GPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Oct 28, 2024
@geunhwan geunhwan added this to the 2024.5 milestone Oct 28, 2024
* @brief This property scales down activations to prevent overflows when inference precision is f16.
* @ingroup ov_runtime_cpp_prop_api
*/
static constexpr Property<float, PropertyMutability::RW> activations_scale_factor{"ACTIVATIONS_SCALE_FACTOR"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add python bindings for this property

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how users are supposed to understand which value to set ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly experimentally, for now. In the future, we plan to have RT Info attribute of ov::Model which can be set from optimum pipelines or NNCF (if they add calibration flow at some point), and this attribute will be converted to plugin property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need later to merge this feature, then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property is enough to solve issues in notebooks or solve issue in customers' pipelines. The features that I mentioned are needed to have better user experience, but those are not mandatory to deliver improvements to the end users.

if (m_scale_factor < 1.f)
return false;

std::cout << "scale_factor: " << m_scale_factor << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

} // namespace pass
} // namespace ov

class ov::pass::ActivationsScaling : public ov::pass::ModelPass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some description of this pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added description of each newly added pass.

std::shared_ptr<ov::Node> inverse_scale_const_f16 = std::make_shared<ov::op::v0::Constant>(ov::element::f16, scale_const_shape, inverse_scale_value);
std::shared_ptr<ov::Node> inverse_scale_const_f32 = std::make_shared<ov::op::v0::Constant>(ov::element::f32, scale_const_shape, inverse_scale_value);

for (auto& node : f->get_ordered_ops()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be implemented as a set of matcher passes? I think it would be much more flexible and readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it as a set of match passes. Thank you.

// \ / ==> \ scale_down
// \ / \ /
// add add
auto add = std::dynamic_pointer_cast<ov::op::v1::Add>(node);
Copy link
Contributor

@vladimir-paramuzov vladimir-paramuzov Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what this pass should do in the first place is following:
in_f16 -> conv/matmul/any_other_needed_op -> out_f16
in_f16 -> Multiply(down)> conv/matmul/any_other_needed_op -> Multiply(up) -> out_f16

Which works well with matcher pass concept. Then we need to optimize case
Multiply(up) -> Multiply(down) to eliminate those. I think it can be achieved using different approaches:

  1. Run some common pass which merged chain of Multiply with scalar into single node (likely we have such opt already). + run NopElimination
  2. Respect Multiply(up) in pattern and do different handling in the callback. For instance, you match optional(Multiply_up) -> Conv pattern. If Multiply is captured, then you just move this Multiply_up node after Convolution. If not, then you insert a pair Multiply down + up around conv.
  3. Implement custom pass which eliminates Multiply down -> Multiply up sequences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Suggestion is to reuse LPT for scale nodes optimization. I.e. at first we insert Multiplies around required ops, then run LPT subset which propagates Multiply (in case of int8 it's dequantization, but it's still Multiply op). Hopefully, we can reuse LPT passes w/o modifications.
Also, MoveEltwiseUpThroughDataMov pass will help to move those scales up in the graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated ActivationsScaling as you reviewed.
First, it inserts Multiply nodes around Conv/MatMul, and eliminates these newly added Multiply nodes by a series of optimization passes. One of them is LinOpSequenceFusion from LPT.

dep.replace_source_output(scale_up->output(0));
}

auto sdpa = std::dynamic_pointer_cast<ov::op::v13::ScaledDotProductAttention>(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[random spot] Please add tests for the transformation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added unit tests.

@geunhwan geunhwan removed this from the 2024.5 milestone Oct 29, 2024
@e-ddykim e-ddykim force-pushed the static_scaling branch 2 times, most recently from 0d7c7cd to bc284f5 Compare October 29, 2024 18:59
@e-ddykim e-ddykim requested a review from a team as a code owner October 29, 2024 18:59
@github-actions github-actions bot added the category: Python API OpenVINO Python bindings label Oct 29, 2024
@e-ddykim e-ddykim force-pushed the static_scaling branch 2 times, most recently from 8f22485 to ebca03d Compare November 4, 2024 12:40
@github-actions github-actions bot removed category: inference OpenVINO Runtime library - Inference category: Python API OpenVINO Python bindings category: CPP API OpenVINO CPP API bindings labels Nov 4, 2024
@AlexKoff88
Copy link
Contributor

@e-ddykim, please consider this PR: huggingface/optimum-intel#994

@e-ddykim e-ddykim removed the WIP work in progress label Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin category: inference OpenVINO Runtime library - Inference category: LP transformations OpenVINO Low Precision transformations category: transformations OpenVINO Runtime library - Transformations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants