Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] LSTMSequence and LSTMCell optimization #26767

Open
wants to merge 172 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 164 commits
Commits
Show all changes
172 commits
Select commit Hold shift + click to select a range
9ce143a
compiles lstm_seq
michal-miotk Jul 18, 2024
027f991
more kernel args
michal-miotk Jul 18, 2024
c191c58
bigger proper run chances
michal-miotk Jul 18, 2024
d461e66
19jul
michal-miotk Jul 19, 2024
01fa2ac
inference works
michal-miotk Jul 19, 2024
1f017fd
in middle of implementation
michal-miotk Jul 21, 2024
5787c7d
problems with inputs get element in kernel
michal-miotk Jul 22, 2024
837db22
not compile
michal-miotk Jul 22, 2024
d4ce531
wipx
michal-miotk Jul 23, 2024
19c268e
wip
michal-miotk Jul 23, 2024
f5273bc
solved problem with too much inputs kernel
michal-miotk Jul 23, 2024
d50b3be
wip
michal-miotk Jul 24, 2024
63a8dfd
more changes
michal-miotk Jul 24, 2024
f54ecc1
wip
michal-miotk Jul 24, 2024
3748a11
wip
michal-miotk Jul 25, 2024
fae772a
wip
michal-miotk Jul 25, 2024
c00ff8a
proper shape for 2 outputs
michal-miotk Jul 25, 2024
1c08b14
Squashed commit of the following:
michal-miotk Jul 29, 2024
6968881
Squashed commit of the following:
michal-miotk Aug 6, 2024
31fcb79
cleaning
michal-miotk Aug 6, 2024
4b16eef
Merge branch 'master' into lstm2
michal-miotk Aug 6, 2024
dcad182
updated to new primitive_base api, disabled lstm to tensor transforma…
michal-miotk Aug 6, 2024
d6aeb54
now it should compile on windows, changed kernel name
michal-miotk Aug 6, 2024
9688f63
deleted cell, deleted input_forget
michal-miotk Aug 6, 2024
5003d47
generic primitive
michal-miotk Aug 7, 2024
5937b14
fix compilation problem, smaller lws
michal-miotk Aug 7, 2024
8b31a91
wip
michal-miotk Aug 8, 2024
2ff5a7c
wip, not resolved fail on dynamic
michal-miotk Aug 8, 2024
2d9e5c6
fixed failing dynamic test
michal-miotk Aug 9, 2024
702e941
change name cldnn::rnn -> cldnn::lstm_seq
michal-miotk Aug 9, 2024
f4d3b71
fix bad order of inputs in lstm_elt constructor
michal-miotk Aug 12, 2024
0c7103c
changed input order in kernel
michal-miotk Aug 12, 2024
f37482a
Squashed commit of the following:
michal-miotk Aug 13, 2024
0058c57
Merge branch 'master' into lstm2
michal-miotk Aug 13, 2024
1ac26d3
fix bad initialization in kernel
michal-miotk Aug 13, 2024
31040bf
generic kernel
michal-miotk Aug 13, 2024
83aa74f
deleted unnecessary cancelled buffer fusing for cell
michal-miotk Aug 14, 2024
0cce00c
Merge branch 'master' into lstm2
michal-miotk Aug 14, 2024
0e37c8a
bigger local workgroup, turned off buffer fusing for lstm cell
michal-miotk Aug 14, 2024
72b48d1
speedup 1.5x after unrolling loop
michal-miotk Aug 14, 2024
7a747c5
barrier in better place
michal-miotk Aug 14, 2024
9b99f04
direction condition on macro, more macro
michal-miotk Aug 14, 2024
5052e26
reducing temp_cell_state
michal-miotk Aug 14, 2024
aa5d906
Revert "reducing temp_cell_state"
michal-miotk Aug 15, 2024
4b524fd
reducing temp cell state
michal-miotk Aug 15, 2024
c47c943
minor kernel speedup (1fps)
michal-miotk Aug 15, 2024
e486376
deleted unnecessary tab for input and hidden result
michal-miotk Aug 16, 2024
fe72cc8
fix windows compilation
michal-miotk Aug 17, 2024
d62f223
more clear kernel algorithm
michal-miotk Aug 19, 2024
0b1fa3d
wip
michal-miotk Aug 19, 2024
3e1fe20
wip vectorized
michal-miotk Aug 19, 2024
cac921c
more vector
michal-miotk Aug 20, 2024
a165f30
fix for vec size, deleted MAX_SEQ_LENGTH
michal-miotk Aug 20, 2024
8f74962
Revert "fix for vec size, deleted MAX_SEQ_LENGTH"
michal-miotk Aug 20, 2024
732eb52
fix vec_size
michal-miotk Aug 20, 2024
165dd9b
optimizations for bigger gpus
michal-miotk Aug 20, 2024
1b9cc98
fix for windows
michal-miotk Aug 20, 2024
37ab01b
fix conversion error
michal-miotk Aug 20, 2024
c99ddc0
Merge branch 'master' into lstm2
michal-miotk Aug 20, 2024
60a0675
merge most important from lstm23
michal-miotk Sep 24, 2024
1b23648
deleted cout
michal-miotk Sep 24, 2024
7c1bf37
Merge branch 'master' into lstm_with_onednn
michal-miotk Sep 24, 2024
40abc31
mainly changes from code review
michal-miotk Sep 25, 2024
56031d9
merged some_wip
michal-miotk Oct 1, 2024
d954fe8
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 1, 2024
78cc4fc
correct in registry
michal-miotk Oct 1, 2024
81ca2ed
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 2, 2024
431d937
deleted level zero, undo changes in visualize_tree
michal-miotk Oct 2, 2024
6b6800f
fix bad name in OV_GPU_PRIMITIVE_IMPL
michal-miotk Oct 2, 2024
db8d75b
returning on conversion to tensor iterator
michal-miotk Oct 3, 2024
a9cd3cf
Squashed commit of the following:
michal-miotk Oct 7, 2024
bfb80ba
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 7, 2024
57faed2
wip
michal-miotk Oct 7, 2024
7f097ba
wip
michal-miotk Oct 8, 2024
a79eca5
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 8, 2024
8d4e46b
should work, turned off forcing immad
michal-miotk Oct 8, 2024
00c6237
added lstm_seq and lstm_cell in implementation manager
michal-miotk Oct 9, 2024
31b8ef0
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 9, 2024
07c1ac2
little cleaning
michal-miotk Oct 9, 2024
a78ef3a
turnedoff immad check for onednn
michal-miotk Oct 9, 2024
5bcab62
deleted unused var
michal-miotk Oct 9, 2024
d564228
redo level_zero_ext to cdb761
michal-miotk Oct 9, 2024
b16bdac
redo mistake change to ov_subgraph
michal-miotk Oct 10, 2024
173b5b2
enabled tests for bfyx kernel
michal-miotk Oct 10, 2024
c8eb682
set to turn on onednn
michal-miotk Oct 10, 2024
43acd2b
turned of impl selection for childs and grandchilds of node, cleaning
michal-miotk Oct 10, 2024
0002e54
added cl_cache extension for *.onednn.cl_cache files
michal-miotk Oct 11, 2024
7741a46
renamed post_optimize_lstm_weights, deleted unused function select_im…
michal-miotk Oct 11, 2024
ac352ea
repair cache tests
michal-miotk Oct 14, 2024
d0fb8b4
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 14, 2024
a1497c4
initialized memory in infer_request_dynamic tests
michal-miotk Oct 14, 2024
f12aebd
fix for failing caching tests
michal-miotk Oct 14, 2024
6170710
deleted event handling as in case in in_order que it is not used
michal-miotk Oct 14, 2024
01dc7dc
preventing duplicates
michal-miotk Oct 14, 2024
e9bf370
repairs in infer_request set and get tensor
michal-miotk Oct 15, 2024
7158776
fused test repair
michal-miotk Oct 15, 2024
5e21106
set in order queue as default test config
michal-miotk Oct 15, 2024
daa83b5
only bfyx format for lstm_seq
michal-miotk Oct 15, 2024
6af1f3f
skipping conv fusion tests
michal-miotk Oct 16, 2024
02942e5
skipping f16 deconv gpu tests
michal-miotk Oct 16, 2024
f8dbec3
conv_fp32_multi_eltquant skip in conv_fusion_test
michal-miotk Oct 17, 2024
2abe8f8
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 17, 2024
00826ad
update hash as input format of weights is custom after post_optimize_…
michal-miotk Oct 17, 2024
36b4853
change format in conv_fp32_multi_eltwise_concat basic test
michal-miotk Oct 17, 2024
c358ab3
fix shape calc for onednn, only bfyx supported for lstmocl
michal-miotk Oct 18, 2024
19b1d93
Revert "optimizations for bigger gpus"
michal-miotk Oct 18, 2024
4da2df6
deleted all get_index safe in lstm bfyx kernel
michal-miotk Oct 18, 2024
303bf7d
applying review part1
michal-miotk Oct 18, 2024
bf9f13f
fix check of dimensions
michal-miotk Oct 19, 2024
459e1ad
fix check of input dim lstm cell
michal-miotk Oct 20, 2024
14e53f4
enable onednn for tests ON, LSTMSeq accept bfyx and fbyx format
michal-miotk Oct 20, 2024
063ac02
dot op, vec_size=4
michal-miotk Oct 20, 2024
892131b
Revert "skipping conv fusion tests"
michal-miotk Oct 20, 2024
b539a3f
Revert "conv_fp32_multi_eltquant skip in conv_fusion_test"
michal-miotk Oct 20, 2024
dc8ac73
lstm_weights optimization is part of post_optimize_weights
michal-miotk Oct 20, 2024
a5165a8
fix forbiddnen size_t->int conversion
michal-miotk Oct 20, 2024
cc6b4b5
Revert "update hash as input format of weights is custom after post_o…
michal-miotk Oct 20, 2024
b168fbe
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 24, 2024
c38c321
inheriting from RNNParams instead of composition
michal-miotk Oct 24, 2024
73cde93
fix failing tests, added input forget
michal-miotk Oct 24, 2024
99a3ca5
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 25, 2024
b5ca43f
fix error - not override
michal-miotk Oct 25, 2024
1d1deb7
reenabling lstm_cell decomposition
michal-miotk Oct 28, 2024
f14ed32
no passing R weights reorder to reorder params
michal-miotk Oct 28, 2024
fb12ef6
refactor weights post optimize
michal-miotk Oct 28, 2024
63bddcb
fix conversion size_t to int
michal-miotk Oct 28, 2024
2b6ad65
only lstmseq decomposition
michal-miotk Oct 28, 2024
8e1d36c
little refactor
michal-miotk Oct 28, 2024
bd81512
micro refactor - deleted one line
michal-miotk Oct 28, 2024
e4fffa5
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 29, 2024
aea04bd
deleting skip of deconvolution_random_test
michal-miotk Oct 30, 2024
7af4e1e
Revert "only lstmseq decomposition"
michal-miotk Oct 31, 2024
75d0b67
updated onednn
michal-miotk Oct 31, 2024
acd6369
oooq only for lstmSeq
michal-miotk Oct 31, 2024
0430ee3
Revert "updated onednn"
michal-miotk Oct 31, 2024
122ad96
Merge branch 'master' into lstm_with_onednn
michal-miotk Nov 1, 2024
a54f49a
fix caching tests
michal-miotk Nov 1, 2024
79ef461
decomposition for seq_len == 1
michal-miotk Nov 1, 2024
31313c9
refactor rnn
michal-miotk Nov 1, 2024
d3ec29b
wip
michal-miotk Nov 1, 2024
09b9283
Merge branch 'master' into lstm_with_onednn
michal-miotk Nov 3, 2024
bde6fb6
minor refactor
michal-miotk Nov 3, 2024
fcee4af
in_order que when support immad in tests
michal-miotk Nov 3, 2024
320c0d9
reordering output done in query
michal-miotk Nov 10, 2024
c7ccf49
flags for onednn, added missing loader and saver statement
michal-miotk Nov 12, 2024
421eb89
cleaning
michal-miotk Nov 12, 2024
c6e160f
cleaning
michal-miotk Nov 12, 2024
0adaa9b
fix error, added more check if device supports immad
michal-miotk Nov 12, 2024
fee5623
deleting unnecessary variables
michal-miotk Nov 12, 2024
3482f33
fix PrimitiveTypeTest
michal-miotk Nov 12, 2024
e174f1a
move functions from reorder factory to post_optimize_weights, enablin…
michal-miotk Nov 14, 2024
06f394a
updated indexing in kernel
michal-miotk Nov 14, 2024
483e637
deleted redundant check of immad
michal-miotk Nov 15, 2024
819e21f
better serialization, better shape calc
michal-miotk Nov 15, 2024
72232ea
better indexing
michal-miotk Nov 17, 2024
178f7fa
Merge branch 'master' into lstm_with_onednn
michal-miotk Nov 18, 2024
3307752
tiny cleaning
michal-miotk Nov 18, 2024
2d58b17
Merge branch 'master' into lstm_with_onednn
michal-miotk Nov 18, 2024
e1fcc01
wip
michal-miotk Nov 19, 2024
9fa6fff
wip
michal-miotk Nov 20, 2024
8bca96d
disable check on concat
michal-miotk Nov 21, 2024
63ee3b4
some kernel tuning, deleted unused var
michal-miotk Nov 21, 2024
c99a348
new shape infer for lsm_cell, changes from review
michal-miotk Nov 21, 2024
b551bda
deleting lstm elt
michal-miotk Sep 2, 2024
d1bec7b
get_arguments in loop
michal-miotk Nov 24, 2024
0a9756d
wip
michal-miotk Nov 24, 2024
2a068c2
fix compilation on windows
michal-miotk Nov 24, 2024
fcdaab0
one less primitive in post optimize weights, deleted legacy output
michal-miotk Nov 25, 2024
6685209
deleted unused has_cell function
michal-miotk Nov 25, 2024
debeb39
check node output layout(when concat) only for case when it is first …
michal-miotk Nov 25, 2024
7ce51bc
undo adding level_zero
michal-miotk Nov 25, 2024
26d0e50
Merge branch 'master' into lstm_with_onednn
michal-miotk Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 0 additions & 148 deletions src/plugins/intel_gpu/include/intel_gpu/primitives/lstm.hpp

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Copyright (C) 2018-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once
#include "primitive.hpp"
#include "activation.hpp"
#include <vector>
#include <algorithm>
#include "intel_gpu/graph/serialization/activation_serializer.hpp"
#include "rnn.hpp"


namespace cldnn {

struct lstm_cell : public RNNParams<lstm_cell> {
CLDNN_DECLARE_PRIMITIVE(lstm_cell)
using vec_activation = std::vector<activation_func>;
using vec_activation_param = std::vector<activation_additional_params>;
using RNNParams::RNNParams;
lstm_cell(const lstm_cell&) = default;
lstm_cell() : RNNParams() {}
};
} // namespace cldnn
203 changes: 203 additions & 0 deletions src/plugins/intel_gpu/include/intel_gpu/primitives/rnn.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
// Copyright (C) 2018-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once
#include "primitive.hpp"
#include "activation.hpp"
#include <vector>
#include <algorithm>
#include <string>
#include "intel_gpu/graph/serialization/activation_serializer.hpp"

namespace cldnn {

/// @brief Weights orders
/// @details Specifies the order in which the weights are concatenated.
/// e.g. [i, o, f, z] : [input, output, forget, block]
/// ONNX order: iofz
/// Caffe order: ifoz
/// pyTorch order: izof
/// OV order: fizo
enum class lstm_weights_order {
iofz,
ifoz,
izof,
fizo
};

template <typename PType>
struct RNNParams : public primitive_base<PType> {
RNNParams() : primitive_base<PType>("", {}) {}
RNNParams(const RNNParams&) = default;
RNNParams(const primitive_id& id,
p-durandin marked this conversation as resolved.
Show resolved Hide resolved
const input_info& x,
const input_info& initial_hidden_state,
const input_info& initial_cell_state,
const input_info& W,
const input_info& R,
const input_info& B,
const input_info& seq_lenghts,
const primitive_id& out1_prim_id = "",
const primitive_id& out2_prim_id = "",
const float clip = 0,
bool input_forget = false,
const std::vector<activation_func>& activations = {activation_func::logistic,
activation_func::hyperbolic_tan,
activation_func::hyperbolic_tan},
const std::vector<activation_additional_params>& activation_params = {},
const lstm_weights_order& offset_order = lstm_weights_order::iofz,
const ov::op::RecurrentSequenceDirection direction = ov::op::RecurrentSequenceDirection::FORWARD,
const padding& output_padding = padding(),
const int num_outputs = 1)
: primitive_base<PType>(id, {x}, num_outputs, {optional_data_type()}, {output_padding}),
x(x),
initial_hidden_state(initial_hidden_state),
initial_cell_state(initial_cell_state),
W(W),
R(R),
B(B),
seq_lenghts(seq_lenghts),
out1_prim_id(out1_prim_id),
out2_prim_id(out2_prim_id),
clip(clip),
input_forget(input_forget),
activations(activations),
activation_params(activation_params),
offset_order(offset_order),
direction(direction) {
std::vector<std::string> pids{initial_hidden_state.pid, initial_cell_state.pid, W.pid, R.pid, B.pid, seq_lenghts.pid, out1_prim_id, out2_prim_id};
for (auto pid : pids) {
if (!pid.empty()) {
primitive_base<PType>::input.push_back(pid);
}
}
}

input_info x;
input_info initial_hidden_state;
input_info initial_cell_state;
input_info W;
input_info R;
input_info B;
input_info seq_lenghts;
primitive_id out1_prim_id;
primitive_id out2_prim_id;
/// @brief Cell clip threshold T. It is applied to the input of activations [-T, T]. No clip is applied if it is not specified.
float clip;
bool input_forget;
/// @brief A list of 3 activation functions for the input, output, forget, cell, and hidden.
std::vector<activation_func> activations;
/// @brief Optional scaling values used by some activation functions. The values are consumed in the order of activation functions.
std::vector<activation_additional_params> activation_params;
/// @brief Weights, recurrent weights, and biases order. [iofz] : ONNX, [ifoz] : Caffe
lstm_weights_order offset_order;
/// @brief direction of LSTMSequence - only FORWARD or REVERSE, currently BIDIRECTIONAL not supported
ov::op::RecurrentSequenceDirection direction;

int num_directions() const {
return direction == ov::op::RecurrentSequenceDirection::BIDIRECTIONAL ? 2 : 1;
}

size_t hash() const override {
size_t seed = primitive::hash();
seed = hash_combine(seed, x.pid);
seed = hash_combine(seed, initial_hidden_state.pid);
seed = hash_combine(seed, initial_cell_state.pid);
seed = hash_combine(seed, seq_lenghts.pid);
seed = hash_combine(seed, W.pid);
seed = hash_combine(seed, R.pid);
seed = hash_combine(seed, B.pid);
seed = hash_combine(seed, out1_prim_id);
seed = hash_combine(seed, out2_prim_id);
seed = hash_combine(seed, clip);
seed = hash_range(seed, activations.begin(), activations.end());
for (auto& act_param : activation_params) {
seed = hash_combine(seed, act_param.a);
seed = hash_combine(seed, act_param.b);
}
seed = hash_combine(seed, offset_order);
seed = hash_combine(seed, direction);
return seed;
}

bool operator==(const primitive& rhs) const override {
if (!primitive::compare_common_params(rhs))
return false;

auto rhs_casted = downcast<const PType>(rhs);
bool act_params_eq = activation_params.size() == rhs_casted.activation_params.size();
for (size_t i = 0; i < activation_params.size(); ++i) {
act_params_eq &= activation_params[i].a == rhs_casted.activation_params[i].a &&
activation_params[i].b == rhs_casted.activation_params[i].b;
}

#define cmp_fields(name) name == rhs_casted.name
return act_params_eq &&
cmp_fields(x) &&
cmp_fields(initial_hidden_state) &&
cmp_fields(initial_cell_state) &&
cmp_fields(seq_lenghts) &&
cmp_fields(W) &&
cmp_fields(R) &&
cmp_fields(B) &&
cmp_fields(out1_prim_id) &&
cmp_fields(out2_prim_id) &&
cmp_fields(clip) &&
cmp_fields(activations) &&
cmp_fields(offset_order) &&
cmp_fields(direction);
#undef cmp_fields
}

void save(BinaryOutputBuffer& ob) const override {
primitive_base<PType>::save(ob);
ob << x;
ob << initial_hidden_state;
ob << initial_cell_state;
ob << W;
ob << R;
ob << B;
ob << seq_lenghts;
ob << out1_prim_id;
ob << out2_prim_id;
ob << clip;
ob << activations;
ob << activation_params;
ob << make_data(&offset_order, sizeof(lstm_weights_order));
ob << make_data(&direction, sizeof(ov::op::RecurrentSequenceDirection));
}

void load(BinaryInputBuffer& ib) override{
primitive_base<PType>::load(ib);
ib >> x;
ib >> initial_hidden_state;
ib >> initial_cell_state;
ib >> W;
ib >> R;
ib >> B;
ib >> seq_lenghts;
ib >> out1_prim_id;
ib >> out2_prim_id;
ib >> clip;
ib >> activations;
ib >> activation_params;
ib >> make_data(&offset_order, sizeof(lstm_weights_order));
ib >> make_data(&direction, sizeof(ov::op::RecurrentSequenceDirection));
}
};

struct lstm_seq : public RNNParams<lstm_seq> {
CLDNN_DECLARE_PRIMITIVE(lstm_seq)
using vec_activation = std::vector<activation_func>;
using vec_activation_param = std::vector<activation_additional_params>;
using RNNParams::RNNParams;
lstm_seq() : RNNParams() {
weights = W.pid;
input = x.pid;
}
lstm_seq(const lstm_seq&) = default;
primitive_id input;
primitive_id weights;
};
} //namespace cldnn
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ static constexpr Property<size_t, PropertyMutability::RW> max_dynamic_batch{"DYN
static constexpr Property<bool, PropertyMutability::RW> nv12_two_inputs{"GPU_NV12_TWO_INPUTS"};
static constexpr Property<float, PropertyMutability::RW> buffers_preallocation_ratio{"GPU_BUFFERS_PREALLOCATION_RATIO"};
static constexpr Property<size_t, PropertyMutability::RW> max_kernels_per_batch{"GPU_MAX_KERNELS_PER_BATCH"};
static constexpr Property<bool, PropertyMutability::RW> use_onednn{"USE_ONEDNN"};

} // namespace intel_gpu
} // namespace ov
Expand Down
3 changes: 3 additions & 0 deletions src/plugins/intel_gpu/src/graph/concatenation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@ concatenation_inst::typed_primitive_inst(network& network, concatenation_node co
if (dim == node.get_primitive()->axis) {
concat_count += input_mem_size[dim];
} else {
if (i.first->get_outputs_count() > 1) {
continue;
}
CLDNN_ERROR_NOT_EQUAL(node.id(),
"Input size dim: " + std::to_string(dim),
input_size[dim],
Expand Down
Loading
Loading