-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] LSTMSequence and LSTMCell optimization #26767
Open
michal-miotk
wants to merge
172
commits into
openvinotoolkit:master
Choose a base branch
from
michal-miotk:lstm_with_onednn
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,052
−998
Open
Changes from 160 commits
Commits
Show all changes
172 commits
Select commit
Hold shift + click to select a range
9ce143a
compiles lstm_seq
michal-miotk 027f991
more kernel args
michal-miotk c191c58
bigger proper run chances
michal-miotk d461e66
19jul
michal-miotk 01fa2ac
inference works
michal-miotk 1f017fd
in middle of implementation
michal-miotk 5787c7d
problems with inputs get element in kernel
michal-miotk 837db22
not compile
michal-miotk d4ce531
wipx
michal-miotk 19c268e
wip
michal-miotk f5273bc
solved problem with too much inputs kernel
michal-miotk d50b3be
wip
michal-miotk 63a8dfd
more changes
michal-miotk f54ecc1
wip
michal-miotk 3748a11
wip
michal-miotk fae772a
wip
michal-miotk c00ff8a
proper shape for 2 outputs
michal-miotk 1c08b14
Squashed commit of the following:
michal-miotk 6968881
Squashed commit of the following:
michal-miotk 31fcb79
cleaning
michal-miotk 4b16eef
Merge branch 'master' into lstm2
michal-miotk dcad182
updated to new primitive_base api, disabled lstm to tensor transforma…
michal-miotk d6aeb54
now it should compile on windows, changed kernel name
michal-miotk 9688f63
deleted cell, deleted input_forget
michal-miotk 5003d47
generic primitive
michal-miotk 5937b14
fix compilation problem, smaller lws
michal-miotk 8b31a91
wip
michal-miotk 2ff5a7c
wip, not resolved fail on dynamic
michal-miotk 2d9e5c6
fixed failing dynamic test
michal-miotk 702e941
change name cldnn::rnn -> cldnn::lstm_seq
michal-miotk f4d3b71
fix bad order of inputs in lstm_elt constructor
michal-miotk 0c7103c
changed input order in kernel
michal-miotk f37482a
Squashed commit of the following:
michal-miotk 0058c57
Merge branch 'master' into lstm2
michal-miotk 1ac26d3
fix bad initialization in kernel
michal-miotk 31040bf
generic kernel
michal-miotk 83aa74f
deleted unnecessary cancelled buffer fusing for cell
michal-miotk 0cce00c
Merge branch 'master' into lstm2
michal-miotk 0e37c8a
bigger local workgroup, turned off buffer fusing for lstm cell
michal-miotk 72b48d1
speedup 1.5x after unrolling loop
michal-miotk 7a747c5
barrier in better place
michal-miotk 9b99f04
direction condition on macro, more macro
michal-miotk 5052e26
reducing temp_cell_state
michal-miotk aa5d906
Revert "reducing temp_cell_state"
michal-miotk 4b524fd
reducing temp cell state
michal-miotk c47c943
minor kernel speedup (1fps)
michal-miotk e486376
deleted unnecessary tab for input and hidden result
michal-miotk fe72cc8
fix windows compilation
michal-miotk d62f223
more clear kernel algorithm
michal-miotk 0b1fa3d
wip
michal-miotk 3e1fe20
wip vectorized
michal-miotk cac921c
more vector
michal-miotk a165f30
fix for vec size, deleted MAX_SEQ_LENGTH
michal-miotk 8f74962
Revert "fix for vec size, deleted MAX_SEQ_LENGTH"
michal-miotk 732eb52
fix vec_size
michal-miotk 165dd9b
optimizations for bigger gpus
michal-miotk 1b9cc98
fix for windows
michal-miotk 37ab01b
fix conversion error
michal-miotk c99ddc0
Merge branch 'master' into lstm2
michal-miotk 60a0675
merge most important from lstm23
michal-miotk 1b23648
deleted cout
michal-miotk 7c1bf37
Merge branch 'master' into lstm_with_onednn
michal-miotk 40abc31
mainly changes from code review
michal-miotk 56031d9
merged some_wip
michal-miotk d954fe8
Merge branch 'master' into lstm_with_onednn
michal-miotk 78cc4fc
correct in registry
michal-miotk 81ca2ed
Merge branch 'master' into lstm_with_onednn
michal-miotk 431d937
deleted level zero, undo changes in visualize_tree
michal-miotk 6b6800f
fix bad name in OV_GPU_PRIMITIVE_IMPL
michal-miotk db8d75b
returning on conversion to tensor iterator
michal-miotk a9cd3cf
Squashed commit of the following:
michal-miotk bfb80ba
Merge branch 'master' into lstm_with_onednn
michal-miotk 57faed2
wip
michal-miotk 7f097ba
wip
michal-miotk a79eca5
Merge branch 'master' into lstm_with_onednn
michal-miotk 8d4e46b
should work, turned off forcing immad
michal-miotk 00c6237
added lstm_seq and lstm_cell in implementation manager
michal-miotk 31b8ef0
Merge branch 'master' into lstm_with_onednn
michal-miotk 07c1ac2
little cleaning
michal-miotk a78ef3a
turnedoff immad check for onednn
michal-miotk 5bcab62
deleted unused var
michal-miotk d564228
redo level_zero_ext to cdb761
michal-miotk b16bdac
redo mistake change to ov_subgraph
michal-miotk 173b5b2
enabled tests for bfyx kernel
michal-miotk c8eb682
set to turn on onednn
michal-miotk 43acd2b
turned of impl selection for childs and grandchilds of node, cleaning
michal-miotk 0002e54
added cl_cache extension for *.onednn.cl_cache files
michal-miotk 7741a46
renamed post_optimize_lstm_weights, deleted unused function select_im…
michal-miotk ac352ea
repair cache tests
michal-miotk d0fb8b4
Merge branch 'master' into lstm_with_onednn
michal-miotk a1497c4
initialized memory in infer_request_dynamic tests
michal-miotk f12aebd
fix for failing caching tests
michal-miotk 6170710
deleted event handling as in case in in_order que it is not used
michal-miotk 01dc7dc
preventing duplicates
michal-miotk e9bf370
repairs in infer_request set and get tensor
michal-miotk 7158776
fused test repair
michal-miotk 5e21106
set in order queue as default test config
michal-miotk daa83b5
only bfyx format for lstm_seq
michal-miotk 6af1f3f
skipping conv fusion tests
michal-miotk 02942e5
skipping f16 deconv gpu tests
michal-miotk f8dbec3
conv_fp32_multi_eltquant skip in conv_fusion_test
michal-miotk 2abe8f8
Merge branch 'master' into lstm_with_onednn
michal-miotk 00826ad
update hash as input format of weights is custom after post_optimize_…
michal-miotk 36b4853
change format in conv_fp32_multi_eltwise_concat basic test
michal-miotk c358ab3
fix shape calc for onednn, only bfyx supported for lstmocl
michal-miotk 19b1d93
Revert "optimizations for bigger gpus"
michal-miotk 4da2df6
deleted all get_index safe in lstm bfyx kernel
michal-miotk 303bf7d
applying review part1
michal-miotk bf9f13f
fix check of dimensions
michal-miotk 459e1ad
fix check of input dim lstm cell
michal-miotk 14e53f4
enable onednn for tests ON, LSTMSeq accept bfyx and fbyx format
michal-miotk 063ac02
dot op, vec_size=4
michal-miotk 892131b
Revert "skipping conv fusion tests"
michal-miotk b539a3f
Revert "conv_fp32_multi_eltquant skip in conv_fusion_test"
michal-miotk dc8ac73
lstm_weights optimization is part of post_optimize_weights
michal-miotk a5165a8
fix forbiddnen size_t->int conversion
michal-miotk cc6b4b5
Revert "update hash as input format of weights is custom after post_o…
michal-miotk b168fbe
Merge branch 'master' into lstm_with_onednn
michal-miotk c38c321
inheriting from RNNParams instead of composition
michal-miotk 73cde93
fix failing tests, added input forget
michal-miotk 99a3ca5
Merge branch 'master' into lstm_with_onednn
michal-miotk b5ca43f
fix error - not override
michal-miotk 1d1deb7
reenabling lstm_cell decomposition
michal-miotk f14ed32
no passing R weights reorder to reorder params
michal-miotk fb12ef6
refactor weights post optimize
michal-miotk 63bddcb
fix conversion size_t to int
michal-miotk 2b6ad65
only lstmseq decomposition
michal-miotk 8e1d36c
little refactor
michal-miotk bd81512
micro refactor - deleted one line
michal-miotk e4fffa5
Merge branch 'master' into lstm_with_onednn
michal-miotk aea04bd
deleting skip of deconvolution_random_test
michal-miotk 7af4e1e
Revert "only lstmseq decomposition"
michal-miotk 75d0b67
updated onednn
michal-miotk acd6369
oooq only for lstmSeq
michal-miotk 0430ee3
Revert "updated onednn"
michal-miotk 122ad96
Merge branch 'master' into lstm_with_onednn
michal-miotk a54f49a
fix caching tests
michal-miotk 79ef461
decomposition for seq_len == 1
michal-miotk 31313c9
refactor rnn
michal-miotk d3ec29b
wip
michal-miotk 09b9283
Merge branch 'master' into lstm_with_onednn
michal-miotk bde6fb6
minor refactor
michal-miotk fcee4af
in_order que when support immad in tests
michal-miotk 320c0d9
reordering output done in query
michal-miotk c7ccf49
flags for onednn, added missing loader and saver statement
michal-miotk 421eb89
cleaning
michal-miotk c6e160f
cleaning
michal-miotk 0adaa9b
fix error, added more check if device supports immad
michal-miotk fee5623
deleting unnecessary variables
michal-miotk 3482f33
fix PrimitiveTypeTest
michal-miotk e174f1a
move functions from reorder factory to post_optimize_weights, enablin…
michal-miotk 06f394a
updated indexing in kernel
michal-miotk 483e637
deleted redundant check of immad
michal-miotk 819e21f
better serialization, better shape calc
michal-miotk 72232ea
better indexing
michal-miotk 178f7fa
Merge branch 'master' into lstm_with_onednn
michal-miotk 3307752
tiny cleaning
michal-miotk 2d58b17
Merge branch 'master' into lstm_with_onednn
michal-miotk e1fcc01
wip
michal-miotk 9fa6fff
wip
michal-miotk 8bca96d
disable check on concat
michal-miotk 63ee3b4
some kernel tuning, deleted unused var
michal-miotk c99a348
new shape infer for lsm_cell, changes from review
michal-miotk b551bda
deleting lstm elt
michal-miotk d1bec7b
get_arguments in loop
michal-miotk 0a9756d
wip
michal-miotk 2a068c2
fix compilation on windows
michal-miotk fcdaab0
one less primitive in post optimize weights, deleted legacy output
michal-miotk 6685209
deleted unused has_cell function
michal-miotk debeb39
check node output layout(when concat) only for case when it is first …
michal-miotk 7ce51bc
undo adding level_zero
michal-miotk 26d0e50
Merge branch 'master' into lstm_with_onednn
michal-miotk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 24 additions & 0 deletions
24
src/plugins/intel_gpu/include/intel_gpu/primitives/lstm_cell.hpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
// Copyright (C) 2018-2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
// | ||
|
||
#pragma once | ||
#include "primitive.hpp" | ||
#include "activation.hpp" | ||
#include <vector> | ||
#include <algorithm> | ||
#include "intel_gpu/graph/serialization/activation_serializer.hpp" | ||
#include "rnn.hpp" | ||
|
||
|
||
namespace cldnn { | ||
|
||
struct lstm_cell : public RNNParams<lstm_cell> { | ||
CLDNN_DECLARE_PRIMITIVE(lstm_cell) | ||
using vec_activation = std::vector<activation_func>; | ||
using vec_activation_param = std::vector<activation_additional_params>; | ||
using RNNParams::RNNParams; | ||
lstm_cell(const lstm_cell&) = default; | ||
lstm_cell() : RNNParams() {} | ||
}; | ||
} // namespace cldnn |
203 changes: 203 additions & 0 deletions
203
src/plugins/intel_gpu/include/intel_gpu/primitives/rnn.hpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
// Copyright (C) 2018-2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
// | ||
|
||
#pragma once | ||
#include "primitive.hpp" | ||
#include "activation.hpp" | ||
#include <vector> | ||
#include <algorithm> | ||
#include <string> | ||
#include "intel_gpu/graph/serialization/activation_serializer.hpp" | ||
|
||
namespace cldnn { | ||
|
||
/// @brief Weights orders | ||
/// @details Specifies the order in which the weights are concatenated. | ||
/// e.g. [i, o, f, z] : [input, output, forget, block] | ||
/// ONNX order: iofz | ||
/// Caffe order: ifoz | ||
/// pyTorch order: izof | ||
/// OV order: fizo | ||
enum class lstm_weights_order { | ||
iofz, | ||
ifoz, | ||
izof, | ||
fizo | ||
}; | ||
|
||
template <typename PType> | ||
struct RNNParams : public primitive_base<PType> { | ||
RNNParams() : primitive_base<PType>("", {}) {} | ||
RNNParams(const RNNParams&) = default; | ||
RNNParams(const primitive_id& id, | ||
p-durandin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
const input_info& x, | ||
const input_info& initial_hidden_state, | ||
const input_info& initial_cell_state, | ||
const input_info& W, | ||
const input_info& R, | ||
const input_info& B, | ||
const input_info& seq_lenghts, | ||
const primitive_id& out1_prim_id = "", | ||
const primitive_id& out2_prim_id = "", | ||
const float clip = 0, | ||
bool input_forget = false, | ||
const std::vector<activation_func>& activations = {activation_func::logistic, | ||
activation_func::hyperbolic_tan, | ||
activation_func::hyperbolic_tan}, | ||
const std::vector<activation_additional_params>& activation_params = {}, | ||
const lstm_weights_order& offset_order = lstm_weights_order::iofz, | ||
const ov::op::RecurrentSequenceDirection direction = ov::op::RecurrentSequenceDirection::FORWARD, | ||
const padding& output_padding = padding(), | ||
const int num_outputs = 1) | ||
: primitive_base<PType>(id, {x}, num_outputs, {optional_data_type()}, {output_padding}), | ||
x(x), | ||
initial_hidden_state(initial_hidden_state), | ||
initial_cell_state(initial_cell_state), | ||
W(W), | ||
R(R), | ||
B(B), | ||
seq_lenghts(seq_lenghts), | ||
out1_prim_id(out1_prim_id), | ||
out2_prim_id(out2_prim_id), | ||
clip(clip), | ||
input_forget(input_forget), | ||
activations(activations), | ||
activation_params(activation_params), | ||
offset_order(offset_order), | ||
direction(direction) { | ||
std::vector<std::string> pids{initial_hidden_state.pid, initial_cell_state.pid, W.pid, R.pid, B.pid, seq_lenghts.pid, out1_prim_id, out2_prim_id}; | ||
for (auto pid : pids) { | ||
if (!pid.empty()) { | ||
primitive_base<PType>::input.push_back(pid); | ||
} | ||
} | ||
} | ||
|
||
input_info x; | ||
input_info initial_hidden_state; | ||
input_info initial_cell_state; | ||
input_info W; | ||
input_info R; | ||
input_info B; | ||
input_info seq_lenghts; | ||
primitive_id out1_prim_id; | ||
primitive_id out2_prim_id; | ||
/// @brief Cell clip threshold T. It is applied to the input of activations [-T, T]. No clip is applied if it is not specified. | ||
float clip; | ||
bool input_forget; | ||
/// @brief A list of 3 activation functions for the input, output, forget, cell, and hidden. | ||
std::vector<activation_func> activations; | ||
/// @brief Optional scaling values used by some activation functions. The values are consumed in the order of activation functions. | ||
std::vector<activation_additional_params> activation_params; | ||
/// @brief Weights, recurrent weights, and biases order. [iofz] : ONNX, [ifoz] : Caffe | ||
lstm_weights_order offset_order; | ||
/// @brief direction of LSTMSequence - only FORWARD or REVERSE, currently BIDIRECTIONAL not supported | ||
ov::op::RecurrentSequenceDirection direction; | ||
|
||
int num_directions() const { | ||
return direction == ov::op::RecurrentSequenceDirection::BIDIRECTIONAL ? 2 : 1; | ||
} | ||
|
||
size_t hash() const override { | ||
size_t seed = primitive::hash(); | ||
seed = hash_combine(seed, x.pid); | ||
seed = hash_combine(seed, initial_hidden_state.pid); | ||
seed = hash_combine(seed, initial_cell_state.pid); | ||
seed = hash_combine(seed, seq_lenghts.pid); | ||
seed = hash_combine(seed, W.pid); | ||
seed = hash_combine(seed, R.pid); | ||
seed = hash_combine(seed, B.pid); | ||
seed = hash_combine(seed, out1_prim_id); | ||
seed = hash_combine(seed, out2_prim_id); | ||
seed = hash_combine(seed, clip); | ||
seed = hash_range(seed, activations.begin(), activations.end()); | ||
for (auto& act_param : activation_params) { | ||
seed = hash_combine(seed, act_param.a); | ||
seed = hash_combine(seed, act_param.b); | ||
} | ||
seed = hash_combine(seed, offset_order); | ||
seed = hash_combine(seed, direction); | ||
return seed; | ||
} | ||
|
||
bool operator==(const primitive& rhs) const override { | ||
if (!primitive::compare_common_params(rhs)) | ||
return false; | ||
|
||
auto rhs_casted = downcast<const PType>(rhs); | ||
bool act_params_eq = activation_params.size() == rhs_casted.activation_params.size(); | ||
for (size_t i = 0; i < activation_params.size(); ++i) { | ||
act_params_eq &= activation_params[i].a == rhs_casted.activation_params[i].a && | ||
activation_params[i].b == rhs_casted.activation_params[i].b; | ||
} | ||
|
||
#define cmp_fields(name) name == rhs_casted.name | ||
return act_params_eq && | ||
cmp_fields(x) && | ||
cmp_fields(initial_hidden_state) && | ||
cmp_fields(initial_cell_state) && | ||
cmp_fields(seq_lenghts) && | ||
cmp_fields(W) && | ||
cmp_fields(R) && | ||
cmp_fields(B) && | ||
cmp_fields(out1_prim_id) && | ||
cmp_fields(out2_prim_id) && | ||
cmp_fields(clip) && | ||
cmp_fields(activations) && | ||
cmp_fields(offset_order) && | ||
cmp_fields(direction); | ||
#undef cmp_fields | ||
} | ||
|
||
void save(BinaryOutputBuffer& ob) const override { | ||
primitive_base<PType>::save(ob); | ||
ob << x; | ||
ob << initial_hidden_state; | ||
ob << initial_cell_state; | ||
ob << W; | ||
ob << R; | ||
ob << B; | ||
ob << seq_lenghts; | ||
ob << out1_prim_id; | ||
ob << out2_prim_id; | ||
ob << clip; | ||
ob << activations; | ||
ob << activation_params; | ||
ob << make_data(&offset_order, sizeof(lstm_weights_order)); | ||
ob << make_data(&direction, sizeof(ov::op::RecurrentSequenceDirection)); | ||
} | ||
|
||
void load(BinaryInputBuffer& ib) override{ | ||
primitive_base<PType>::load(ib); | ||
ib >> x; | ||
ib >> initial_hidden_state; | ||
ib >> initial_cell_state; | ||
ib >> W; | ||
ib >> R; | ||
ib >> B; | ||
ib >> seq_lenghts; | ||
ib >> out1_prim_id; | ||
ib >> out2_prim_id; | ||
ib >> clip; | ||
ib >> activations; | ||
ib >> activation_params; | ||
ib >> make_data(&offset_order, sizeof(lstm_weights_order)); | ||
ib >> make_data(&direction, sizeof(ov::op::RecurrentSequenceDirection)); | ||
} | ||
}; | ||
|
||
struct lstm_seq : public RNNParams<lstm_seq> { | ||
CLDNN_DECLARE_PRIMITIVE(lstm_seq) | ||
using vec_activation = std::vector<activation_func>; | ||
using vec_activation_param = std::vector<activation_additional_params>; | ||
using RNNParams::RNNParams; | ||
lstm_seq() : RNNParams() { | ||
weights = W.pid; | ||
input = x.pid; | ||
} | ||
lstm_seq(const lstm_seq&) = default; | ||
primitive_id input; | ||
primitive_id weights; | ||
}; | ||
} //namespace cldnn |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you still keep (and use)
lstm_elt
primitive given that you introducelstm_cell
andlstm_seq
? I'd expect that it's not needed anymoreThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is poor performance of onednn on case of seq_len = 1 , so I don't update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try enabling ngraph pass for decomposition in such case? Ideally we need to get rid of this lstm_elt primitive and related decomposition code in program builder to switch to new shape inference completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lstm_cell which will be used in such case is too slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the initial goals of this patch was removing this lstm decomposition on the plugin side to bunch of custom primitives (and thus removing lstm_elt primitive). And that's still needed.
Also, as I can see, lstm_cell primitive is not used at all currently, which means there's no sense to add it. So my suggestion is to continue perf tuning then.