[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265

e-ddykim · 2024-10-28T02:20:21Z

Details:

When a model runs at inference precision of f16, it might be unable to calculate correct results due to limited range of f16.
The purpose of this PR is to avoid situations where overflow occurs during calculation by scaling down the activation, thereby obtaining correct results when the infer precision is f16.
A new config property "ACTIVATIONS_SCALE_FACTOR" is introduced, which holds a single floating-point value. For example, if it is 64, activations are divided by 64 before Convolution and MatMul. If it is smaller than 0, this feature is disabled.
- This property also can be set via rt_info of a model as below.

    <rt_info>
        <runtime_options>
            <ACTIVATIONS_SCALE_FACTOR value="8.0" />
        </runtime_options>
    </rt_info>

Tickets:

147052

vladimir-paramuzov · 2024-10-28T05:36:42Z

src/inference/include/openvino/runtime/properties.hpp

+ * @brief This property scales down activations to prevent overflows when inference precision is f16.
+ * @ingroup ov_runtime_cpp_prop_api
+ */
+static constexpr Property<float, PropertyMutability::RW> activations_scale_factor{"ACTIVATIONS_SCALE_FACTOR"};


Please add python bindings for this property

how users are supposed to understand which value to set ?

Mainly experimentally, for now. In the future, we plan to have RT Info attribute of ov::Model which can be set from optimum pipelines or NNCF (if they add calibration flow at some point), and this attribute will be converted to plugin property.

Maybe we need later to merge this feature, then?

Property is enough to solve issues in notebooks or solve issue in customers' pipelines. The features that I mentioned are needed to have better user experience, but those are not mandatory to deliver improvements to the end users.

vladimir-paramuzov · 2024-10-28T05:37:09Z

src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp

+    if (m_scale_factor < 1.f)
+        return false;
+
+    std::cout << "scale_factor: " << m_scale_factor << std::endl;


I removed it.

vladimir-paramuzov · 2024-10-28T05:37:34Z

src/common/transformations/include/transformations/common_optimizations/activations_scaling.hpp

+}  // namespace pass
+}  // namespace ov
+
+class ov::pass::ActivationsScaling : public ov::pass::ModelPass {


Please add some description of this pass

I added description of each newly added pass.

vladimir-paramuzov · 2024-10-28T05:51:53Z

src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp

+    std::shared_ptr<ov::Node> inverse_scale_const_f16 = std::make_shared<ov::op::v0::Constant>(ov::element::f16, scale_const_shape, inverse_scale_value);
+    std::shared_ptr<ov::Node> inverse_scale_const_f32 = std::make_shared<ov::op::v0::Constant>(ov::element::f32, scale_const_shape, inverse_scale_value);
+
+    for (auto& node : f->get_ordered_ops()) {


Can it be implemented as a set of matcher passes? I think it would be much more flexible and readable

I updated it as a set of match passes. Thank you.

src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp

vladimir-paramuzov · 2024-10-28T06:13:11Z

src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp

+        //           \       /         ==>            \    scale_down
+        //            \     /                          \     /
+        //              add                              add
+        auto add = std::dynamic_pointer_cast<ov::op::v1::Add>(node);


I think what this pass should do in the first place is following:
in_f16 -> conv/matmul/any_other_needed_op -> out_f16
in_f16 -> Multiply(down)> conv/matmul/any_other_needed_op -> Multiply(up) -> out_f16

Which works well with matcher pass concept. Then we need to optimize case
Multiply(up) -> Multiply(down) to eliminate those. I think it can be achieved using different approaches:

Run some common pass which merged chain of Multiply with scalar into single node (likely we have such opt already). + run NopElimination

Respect Multiply(up) in pattern and do different handling in the callback. For instance, you match optional(Multiply_up) -> Conv pattern. If Multiply is captured, then you just move this Multiply_up node after Convolution. If not, then you insert a pair Multiply down + up around conv.

Implement custom pass which eliminates Multiply down -> Multiply up sequences

Update: Suggestion is to reuse LPT for scale nodes optimization. I.e. at first we insert Multiplies around required ops, then run LPT subset which propagates Multiply (in case of int8 it's dequantization, but it's still Multiply op). Hopefully, we can reuse LPT passes w/o modifications.
Also, MoveEltwiseUpThroughDataMov pass will help to move those scales up in the graph.

I updated ActivationsScaling as you reviewed.
First, it inserts Multiply nodes around Conv/MatMul, and eliminates these newly added Multiply nodes by a series of optimization passes. One of them is LinOpSequenceFusion from LPT.

vladimir-paramuzov · 2024-10-28T06:13:36Z

src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp

+            dep.replace_source_output(scale_up->output(0));
+        }
+
+        auto sdpa = std::dynamic_pointer_cast<ov::op::v13::ScaledDotProductAttention>(node);


[random spot] Please add tests for the transformation

I added unit tests.

AlexKoff88 · 2024-11-12T05:44:45Z

@e-ddykim, please consider this PR: huggingface/optimum-intel#994

e-ddykim requested review from a team as code owners October 28, 2024 02:20

e-ddykim requested review from itikhono and removed request for a team October 28, 2024 02:20

github-actions bot added category: inference OpenVINO Runtime library - Inference category: GPU OpenVINO GPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Oct 28, 2024

geunhwan added this to the 2024.5 milestone Oct 28, 2024

geunhwan added Code Freeze priority: high High piority labels Oct 28, 2024

vladimir-paramuzov requested changes Oct 28, 2024

View reviewed changes

geunhwan removed this from the 2024.5 milestone Oct 29, 2024

geunhwan removed priority: high High piority Code Freeze labels Oct 29, 2024

e-ddykim force-pushed the static_scaling branch 2 times, most recently from 0d7c7cd to bc284f5 Compare October 29, 2024 18:59

e-ddykim requested a review from a team as a code owner October 29, 2024 18:59

github-actions bot added the category: Python API OpenVINO Python bindings label Oct 29, 2024

e-ddykim force-pushed the static_scaling branch 2 times, most recently from 8f22485 to ebca03d Compare November 4, 2024 12:40

github-actions bot removed category: inference OpenVINO Runtime library - Inference category: Python API OpenVINO Python bindings category: CPP API OpenVINO CPP API bindings labels Nov 4, 2024

e-ddykim force-pushed the static_scaling branch from ebca03d to cc4b37f Compare November 4, 2024 12:57

e-ddykim force-pushed the static_scaling branch from 46b17ca to 6491951 Compare November 11, 2024 13:56

e-ddykim force-pushed the static_scaling branch from cd42c04 to 5820e2b Compare November 12, 2024 06:51

e-ddykim and others added 28 commits December 3, 2024 01:36

updated to use multiple MatcherPass

919bff7

updated code style

516f351

updated code style

f13b8a0

added unit tests

ee1af24

update code style

edc5cee

updated code style

bbd5e49

updated code style

8c4cd3b

updated code style

03c40b9

updated for transformer of FLUX.1

be4f04a

disabled FullyConnectedPerLayerScaling

82381d5

added unit tests

9de374a

fixed code style

f223892

Enable FullyConnectedHorizontalFusion with activations scaling

c97b35e

updated ScaleDownMultipleLayers

88282cd

updated code style

d195ab7

reading ACTIVATIONS_SCALE_FACTOR from rt_info

4a87ee9

updated to use LPT

cefd082

fixed for flux.1 dynamic model

dbf94fb

fix merging faults

08294d3

fixes for flux.1

5dbd12e

update not to add redundant Convert

bff6b81

updated apply_rt_info

fa0a7af

added a new ScaleDownFusion pass

dc4a321

added a new param useDefaultTransformation for activations scaling

b8e3df2

update code style

da6826a

update code style

7a60d1c

updated clamp_fp16 tests

982ec28

code cleanup

5fca79d

e-ddykim force-pushed the static_scaling branch from 9cee4a5 to 5fca79d Compare December 2, 2024 16:36

e-ddykim removed the WIP work in progress label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265

[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265

e-ddykim commented Oct 28, 2024 •

edited

Loading

vladimir-paramuzov Oct 28, 2024

ilya-lavrenov Oct 28, 2024

vladimir-paramuzov Oct 28, 2024

ilya-lavrenov Oct 28, 2024

vladimir-paramuzov Oct 28, 2024

vladimir-paramuzov Oct 28, 2024

e-ddykim Oct 29, 2024

vladimir-paramuzov Oct 28, 2024

e-ddykim Oct 29, 2024

vladimir-paramuzov Oct 28, 2024

e-ddykim Oct 29, 2024

vladimir-paramuzov Oct 28, 2024 •

edited

Loading

vladimir-paramuzov Oct 28, 2024

e-ddykim Oct 29, 2024

vladimir-paramuzov Oct 28, 2024

e-ddykim Oct 29, 2024

AlexKoff88 commented Nov 12, 2024

[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265

Are you sure you want to change the base?

[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265

Conversation

e-ddykim commented Oct 28, 2024 • edited Loading

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimir-paramuzov Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexKoff88 commented Nov 12, 2024

e-ddykim commented Oct 28, 2024 •

edited

Loading

vladimir-paramuzov Oct 28, 2024 •

edited

Loading