-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] LSTMSequence and LSTMCell optimization #26767
base: master
Are you sure you want to change the base?
[GPU] LSTMSequence and LSTMCell optimization #26767
Conversation
commit 232d272f11fbe65e82fa9787260a8b9d34b57d20 Author: michal-miotk <[email protected]> Date: Mon Jul 29 11:17:47 2024 +0000 wip commit e642ca3 Author: michal-miotk <[email protected]> Date: Sun Jul 28 22:08:24 2024 +0000 wip commit c6b74d3 Author: michal-miotk <[email protected]> Date: Fri Jul 26 14:10:26 2024 +0000 wip commit 0451429 Author: michal-miotk <[email protected]> Date: Thu Jul 25 20:35:11 2024 +0000 wip3
commit 1164592 Author: michal-miotk <[email protected]> Date: Tue Aug 6 09:25:45 2024 +0000 wip commit 8b2c049 Author: michal-miotk <[email protected]> Date: Tue Aug 6 09:24:02 2024 +0000 wip commit 886b412 Author: michal-miotk <[email protected]> Date: Mon Aug 5 14:59:14 2024 +0000 wip commit 08fb207 Author: michal-miotk <[email protected]> Date: Sun Aug 4 20:21:38 2024 +0000 wip, errors on half commit 125884d Author: michal-miotk <[email protected]> Date: Sat Aug 3 23:59:58 2024 +0000 wip commit af4f209 Author: michal-miotk <[email protected]> Date: Fri Aug 2 17:58:38 2024 +0000 wip commit 12626fc Author: michal-miotk <[email protected]> Date: Fri Aug 2 10:52:15 2024 +0000 wip commit dfdd052 Author: michal-miotk <[email protected]> Date: Thu Aug 1 15:38:41 2024 +0000 wip commit 54ee912 Author: michal-miotk <[email protected]> Date: Thu Aug 1 11:01:55 2024 +0000 only bfyx layout commit 240fe4a Author: michal-miotk <[email protected]> Date: Thu Aug 1 10:34:45 2024 +0000 two outputs from prim commit bc775be Author: michal-miotk <[email protected]> Date: Wed Jul 31 22:13:14 2024 +0000 wip commit d1cfd60 Author: michal-miotk <[email protected]> Date: Wed Jul 31 22:07:06 2024 +0000 wip commit 7d18884 Author: michal-miotk <[email protected]> Date: Wed Jul 31 19:19:04 2024 +0000 begin of handling reverse commit 39f64af Author: michal-miotk <[email protected]> Date: Wed Jul 31 15:37:06 2024 +0000 betterbetter commit 67b3c9a Author: michal-miotk <[email protected]> Date: Wed Jul 31 13:12:39 2024 +0000 better commit 6ded5aa Author: michal-miotk <[email protected]> Date: Wed Jul 31 10:12:31 2024 +0000 wip commit 1ccdacc Author: michal-miotk <[email protected]> Date: Tue Jul 30 23:07:21 2024 +0000 wip commit ab1307c Author: michal-miotk <[email protected]> Date: Tue Jul 30 22:00:50 2024 +0000 test passed commit bc65969 Author: michal-miotk <[email protected]> Date: Tue Jul 30 15:37:20 2024 +0000 wip commit 03cbf57 Author: michal-miotk <[email protected]> Date: Tue Jul 30 15:15:06 2024 +0000 only 2 outputs commit fd5f3dc Author: michal-miotk <[email protected]> Date: Tue Jul 30 14:47:12 2024 +0000 wip commit 939d23c Author: michal-miotk <[email protected]> Date: Tue Jul 30 11:34:56 2024 +0000 wip commit 2bb561f Author: michal-miotk <[email protected]> Date: Tue Jul 30 09:28:03 2024 +0000 added to binary buffer commit 1ef83ff Author: michal-miotk <[email protected]> Date: Mon Jul 29 22:30:57 2024 +0000 not works
…tion only in gpu plugin
src/plugins/intel_gpu/src/graph/graph_optimizer/post_optimize_lstm_weights.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/graph/graph_optimizer/post_optimize_lstm_weights.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/lstm_cell_and_seq_bfyx.cl
Outdated
Show resolved
Hide resolved
…g onednn on template Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
|
||
struct lstm_elt : public primitive_base<lstm_elt> { | ||
struct lstm_elt : public RNNParams<lstm_elt> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the initial goals of this patch was removing this lstm decomposition on the plugin side to bunch of custom primitives (and thus removing lstm_elt primitive). And that's still needed.
Also, as I can see, lstm_cell primitive is not used at all currently, which means there's no sense to add it. So my suggestion is to continue perf tuning then.
const uint prev_idx = real_seq_length - i; | ||
#else | ||
const uint prev_idx = i-1; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[random spot] As I can see, bidirectional LSTM sequence works incorrectly.
[ RUN ] LSTMSequenceCommonZeroClip/LSTMSequenceTest.Inference/mode=PURE_SEQ_seq_lengths=2_batch=10_hidden_size=10_input_size=10_IS=(10.10)(10.10)(10.10)(40.10)(40.10)(40)_activations=(sigmoid.tanh.tanh)_direction=bidirectional_clip=0_WRBType=CONSTANT_modelType=f32_targetDevice=GPU_
Expected: 0.995055 Actual: 0 Coordinate: 20 Diff: 0.995055 calculated_abs_threshold: 9.96247e-05 abs_threshold: 1.19209e-07 rel_threshold: 0.0001
src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp:97: Failure
[ COMPARATION ] COMPARATION IS FAILED! incorrect elem counter: 1 among 400 shapes.
[ FAILED ] LSTMSequenceCommonZeroClip/LSTMSequenceTest.Inference/mode=PURE_SEQ_seq_lengths=2_batch=10_hidden_size=10_input_size=10_IS=(10.10)(10.10)(10.10)(40.10)(40.10)(40)_activations=(sigmoid.tanh.tanh)_direction=bidirectional_clip=0_WRBType=CONSTANT_modelType=f32_targetDevice=GPU_, where GetParam() = (PURE_SEQ, 2, 10, 10, 10, { "sigmoid", "tanh", "tanh" }, 0, bidirectional, CONSTANT, f32, "GPU") (52 ms)
(I've manually disabled bidirectional sequence decomposition and enforced usage of lstm_seq primitive to get that resul)t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/plugins/intel_gpu/src/kernel_selector/kernels/lstm/lstm_seq_kernel_bfyx.cpp
Outdated
Show resolved
Hide resolved
auto mutable_precision_firstsecond = op->get_output_element_type(1); | ||
auto direction = op->get_direction(); | ||
|
||
if (p.use_new_shape_infer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest enforcing new shape infer for model with LSTM cell/seq here https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_gpu/src/plugin/program_builder.cpp#L347 and just drop the code for legacy shape infer for this primitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
@@ -189,6 +189,9 @@ std::shared_ptr<ov::ICompiledModel> Plugin::compile_model(const std::shared_ptr< | |||
|
|||
ExecutionConfig config = m_configs_map.at(device_id); | |||
config.set_user_property(orig_config); | |||
if (context->get_engine().get_device_info().supports_immad) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These change in plugin.cpp are not needed as you have same logic in apply_user_properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
op_mode, 1, axis, num_splits)); | ||
p.add_primitive(*op, cldnn::reshape(outputCellID, cldnn::input_info(outputCellCropID), | ||
false, outSzPt, op->get_output_partial_shape(1))); | ||
p.add_primitive(*op, cldnn::lstm_cell(layerName+".out0", inputs[0], inputs[1], inputs[2], inputs[3], inputs[4], inputs[5], \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think you can enforce new shape infer for LSTMCell as well.
- these out1_prim_id and out2_prim_id are not needed for new shape infer, so you can remove them from primitive api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I think item 2 is still relevant. You pass this layerName + "_md_write.1"
argument, and the corresponding parameters from primitive API are still there.
src/plugins/intel_gpu/src/graph/graph_optimizer/post_optimize_weights.cpp
Show resolved
Hide resolved
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
std::vector<cldnn::activation_func> activations; | ||
std::vector<cldnn::activation_additional_params> activation_params; | ||
GetLSTMActivationParams(op, activations, activation_params); | ||
float clip = op->get_clip(); | ||
|
||
assert(!inputs[5].pid.empty()); | ||
if (p.use_new_shape_infer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest replacing it with OPENVINO_ASSERT to ensure that method is called correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
op_mode, 1, axis, num_splits)); | ||
p.add_primitive(*op, cldnn::reshape(outputCellID, cldnn::input_info(outputCellCropID), | ||
false, outSzPt, op->get_output_partial_shape(1))); | ||
p.add_primitive(*op, cldnn::lstm_cell(layerName+".out0", inputs[0], inputs[1], inputs[2], inputs[3], inputs[4], inputs[5], \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I think item 2 is still relevant. You pass this layerName + "_md_write.1"
argument, and the corresponding parameters from primitive API are still there.
@@ -278,6 +278,9 @@ ov::SupportedOpsMap Plugin::query_model(const std::shared_ptr<const ov::Model>& | |||
|
|||
ExecutionConfig config = m_configs_map.at(device_id); | |||
config.set_user_property(orig_config); | |||
if (ctx->get_engine().get_device_info().supports_immad) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 changes are not needed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
p.add_primitive(*op, cldnn::crop(cellStr, cldnn::input_info(lstm_elt_id), hiddenSz, cellCropSz)); | ||
} | ||
const float clip = op->get_clip(); | ||
if (op->get_input_shape(2).size() != 3 || op->get_input_shape(3).size() != 1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also redundant backslashes here and in other places. Please remove those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
p.add_primitive(*op, cldnn::reshape(layerName + ".out0", concatStr, tensor_from_dims(op->get_output_shape(0))), {layerName}); | ||
p.add_primitive(*op, cldnn::reshape(layerName + ".out1", hiddenStr, tensor_from_dims(op->get_output_shape(1)))); | ||
p.add_primitive(*op, cldnn::reshape(layerName + ".out2", cellStr, tensor_from_dims(op->get_output_shape(2)))); | ||
if (p.use_new_shape_infer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPENVINO_ASSERT here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public: | ||
using parent::parent; | ||
|
||
program_node& input() const { return get_dependency(0); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely the same unused methods as for lstm_seq primitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::vector<format::type> in_fmts(node.get_dependencies().size(), format::any); | ||
std::vector<format::type> out_fmts(node.get_outputs_count(), format::any); | ||
|
||
size_t out_rank = node.get_output_layout().get_rank(); | ||
for (size_t idx = 0 ; idx < node.get_dependencies().size() ; idx++) { | ||
if (node.get_dependency(idx).is_constant()) | ||
continue; | ||
|
||
auto target_format = format::get_default_format(out_rank); | ||
|
||
in_fmts[idx] = target_format; | ||
} | ||
out_fmts[0] = format::ybfx; | ||
|
||
return {in_fmts, out_fmts}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that code should actually query onednn for the required tensor formats (as it's done for convolutions). You can do it in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
return node.get_input_layout(0).format == cldnn::format::bfyx || node.get_input_layout(0).format == cldnn::format::fbyx \ | ||
|| node.get_input_layout(0).format == cldnn::format::ybfx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tensor format is not the only restriction. At least we need
- Type checks
info.arch == gpu_arch::unknown
(see other impls)- padding checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1., 2. done, 3.not done
int i = 0; | ||
auto& input = instance.input_memory(i); | ||
auto offset = onednn::get_offset(instance.get_input_layout(i), | ||
_pd.dnnl::primitive_desc_base::src_desc(i)); | ||
auto mem = input.get_onednn_memory(_pd.dnnl::primitive_desc_base::src_desc(i), offset); | ||
args.insert({DNNL_ARG_SRC_LAYER, mem}); | ||
} | ||
|
||
{ | ||
int i = 1; | ||
auto& input = instance.input_memory(i); | ||
auto offset = onednn::get_offset(instance.get_input_layout(i), | ||
_pd.dnnl::primitive_desc_base::src_desc(i)); | ||
auto mem = input.get_onednn_memory(_pd.dnnl::primitive_desc_base::src_desc(i), offset); | ||
args.insert({DNNL_ARG_SRC_ITER, mem}); | ||
} | ||
|
||
{ | ||
int i = 2; | ||
auto& input = instance.input_memory(i); | ||
auto offset = onednn::get_offset(instance.get_input_layout(i), | ||
_pd.dnnl::primitive_desc_base::src_desc(i)); | ||
auto mem = input.get_onednn_memory(_pd.dnnl::primitive_desc_base::src_desc(i), offset); | ||
args.insert({DNNL_ARG_SRC_ITER_C, mem}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code can be done in a loop if you store these DNNL_ARG_SRC_LAYER
, DNNL_ARG_SRC_ITER
, etc in a vector. Same for weights and dst buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
auto hiddenSize = reorder_params->get_output_layout().get_shape()[1] / 4; | ||
auto cropSize = cldnn::tensor{dir_num, static_cast<int>(hiddenSize), 1, 1}; | ||
std::string crop_id_b = input_id + "_c"; | ||
auto get_crop_node = [&](int cropNum) -> cldnn::program_node& { | ||
auto crop_id = primitive_id(crop_id_b + std::to_string(cropNum)); | ||
auto crop_prim = std::make_shared<cldnn::crop>(crop_id, input_id, cropSize, cldnn::tensor{0, static_cast<int>(cropNum*hiddenSize), 0, 0}); | ||
return p.get_or_create(crop_prim); | ||
}; | ||
auto& crop0_node = get_crop_node(0); | ||
auto& crop1_node = get_crop_node(1); | ||
auto& crop2_node = get_crop_node(2); | ||
auto& crop3_node = get_crop_node(3); | ||
std::vector<input_info> con_input{input_info(crop1_node.id()), input_info(crop0_node.id()), input_info(crop2_node.id()), input_info(crop3_node.id())}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be done with some kind of Slice/StridedSlice primitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be, actually I've deleted one crop, but I don't think it will be easy to have less nodes using StridedSlice primitive
…output of node Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
Details:
Tickets: