-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU]whisper readvalue optimize #26130
base: master
Are you sure you want to change the base?
[CPU]whisper readvalue optimize #26130
Conversation
Profile each node execute time. Support Static and Dynamic infer. Signed-off-by: xipingya <[email protected]>
If reset is not called, these marked nodes also desn't need to be executed. Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
decoder network: 20ms -> 5 ms. Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
…r this. 2: Fix ReadValueAssignTest fail issue, just make sure "initOptimalPrimitiveDescriptor" don't change original primitive. Signed-off-by: xipingya <[email protected]>
…toolkit#26819) ### Details: - *Pattern: QKV_Reshape -> QKV_Transpose -> SDPA->OUT_Transpse->OUT_Reshape* - *Fuse this pattern to: SDPA* - *This hotspot can be observed after openvinotoolkit#26130, this PR's implementation doesn't depend on it.* ### Tickets: - *153616* --------- Signed-off-by: xipingya <[email protected]>
…aph input memory. avoid data corruption. Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
INTERNAL_OP_SCOPE(intel_cpu_ReadValueWithSubgraphNode_clone_with_new_inputs); | ||
|
||
check_new_args_count(this, new_args); | ||
auto op = std::make_shared<ov::intel_cpu::ReadValueWithSubgraph>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we pass the variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move some parameters to constructor of ReadValueWithSubgraph and add associated constructor.
...ns/intel_cpu/src/transformations/cpu_opset/common/pass/move_readvalue_inputs_to_subgraph.hpp
Outdated
Show resolved
Hide resolved
...ns/intel_cpu/src/transformations/cpu_opset/common/pass/move_readvalue_inputs_to_subgraph.cpp
Outdated
Show resolved
Hide resolved
...ns/intel_cpu/src/transformations/cpu_opset/common/pass/move_readvalue_inputs_to_subgraph.cpp
Outdated
Show resolved
Hide resolved
...ns/intel_cpu/src/transformations/cpu_opset/common/pass/move_readvalue_inputs_to_subgraph.cpp
Show resolved
Hide resolved
…n the future, it may be deleted. Signed-off-by: xipingya <[email protected]>
2: Adopt parent configuration, avoid to insert reorder before the MemoryInput. 3: move prepare param to runDynamic, because it is not called each time.
It is responsibility of MemoryInputSingle or MemoryOutput
Add: CPU_GRAPH_OPTIMIZER_SCOPE(DropRedundantMemoryOutput_SubGraph); before create edge, call graph.RemoveEdge(parentEdge); Signed-off-by: xipingya <[email protected]>
…litply parents edges. Signed-off-by: xipingya <[email protected]>
Update comments: // Flag: find Output node
Signed-off-by: xipingya <[email protected]>
This reverts commit 5d6c9de. Because Whisper inference result is wrong. It is very strange, subgraph output and state have same memory ptr, if I remove below code, whisper's result is wrong. auto& outputs = subGraph->GetOutputNodesMap(); OPENVINO_ASSERT(outputs.size() == 1); auto itr = outputs.begin(); src = itr->second->getSrcMemoryAtPort(0); # Conflicts: # src/plugins/intel_cpu/src/nodes/memory.cpp
// for the output descriptors, use the configuration of the graph's output nodes | ||
auto outputDescriptors = subGraph->getOutputMemoryDescriptors(); | ||
|
||
const auto& desc = outputDescriptors.front(); | ||
|
||
std::vector<PortConfig> outConfs; | ||
|
||
outConfs.emplace_back(desc, BlockedMemoryDesc::FULL_MASK, 0); // use the memory from the first input inPlace | ||
|
||
const NodeConfig config(inConfs, outConfs); | ||
|
||
supportedPrimitiveDescriptors.clear(); | ||
supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::undef); | ||
|
||
selectPrimitiveDescriptorByIndex(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above we enforced the output memory descriptor of the subgraph to be the same as in the node, so we don't need to redefine the node config. The only thing we really need to do here is inserting a sanity check:
// for the output descriptors, use the configuration of the graph's output nodes
auto outputDescriptors = m_graph.getOutputMemoryDescriptors();
const auto& desc = outputDescriptors.front();
// just a sanity check
CPU_NODE_ASSERT(desc->isCompatible(*(config.outConfs.front().getMemDesc())), "Unexpected node <-> subgraph output memory descriptor mismatch");
Just to make sure that the output memory descriptor has been really enforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maxnick , if we enforce memory descriptor here, the subGraph also need to re-init(https://github.com/xipingyan/openvino/blob/7060e852ef2ff532351b3e8421c127515322767f/src/plugins/intel_cpu/src/nodes/memory.cpp#L683).
Because subGraph->Init
can only call once(second call can trigger crash).
So I removed MemoryInput::selectOptimalPrimitiveDescriptor()
, and move then to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to enforce it exactly here! It's already enforced in
std::vector<Input::OutputConfig> graphOutputConfig;
for (auto&& portConfig : config.outConfs) {
auto desc = portConfig.getMemDesc();
graphOutputConfig.emplace_back(node::Input::OutputConfig{desc, true});
}
My comment was about exactly these lines of code
std::vector<PortConfig> outConfs;
outConfs.emplace_back(desc, BlockedMemoryDesc::FULL_MASK, 0); // use the memory from the first input inPlace
const NodeConfig config(inConfs, outConfs);
supportedPrimitiveDescriptors.clear();
supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::undef);
selectPrimitiveDescriptorByIndex(0);
We mustn't call these methods, as they redefine the node port descriptors, but we don't need to redefine them at all, as the dependency is directed toward the subgraph, i.e. we enforce the existing node memory descriptors in the subgraph to allow memory substitution. That's it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, updated.
Signed-off-by: xipingya <[email protected]>
+graphOutputConfig.emplace_back(node::Input::OutputConfig{desc, true});
Signed-off-by: xipingya <[email protected]>
…PrimitiveDescriptor() Signed-off-by: xipingya <[email protected]>
I don't why all test can pass without shape inference. `InternalDynShapeInferFactory` seems like doing nothing. Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
…to replace my own implementation. Signed-off-by: xipingya <[email protected]>
Details:
ReadValueWithSubgraph
node.ReadValue
's initial subgraph nodes toReadValueWithSubgraph
ReadValueWithSubgraph
toMemoryInput
Init
andActivate
of ov::intel_cpu::Graph, avoid to memory copy. Refer: [CPU] Introduce SubModel op and Composite node #25385Tickets: