Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU]whisper readvalue optimize #26130

Open
wants to merge 106 commits into
base: master
Choose a base branch
from

Conversation

xipingyan
Copy link
Contributor

@xipingyan xipingyan commented Aug 20, 2024

Details:

Tickets:

  • 128743

Profile each node execute time.
Support Static and Dynamic infer.

Signed-off-by: xipingya <[email protected]>
If reset is not called, these marked nodes also desn't need to be executed.

Signed-off-by: xipingya <[email protected]>
@xipingyan xipingyan requested a review from maxnick August 20, 2024 08:33
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Aug 20, 2024
@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Sep 3, 2024
@github-actions github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings labels Sep 10, 2024
…r this.

2: Fix ReadValueAssignTest fail issue, just make sure "initOptimalPrimitiveDescriptor" don't change original primitive.

Signed-off-by: xipingya <[email protected]>
CuriousPanCake pushed a commit to CuriousPanCake/openvino that referenced this pull request Nov 6, 2024
…toolkit#26819)

### Details:
- *Pattern: QKV_Reshape -> QKV_Transpose ->
SDPA->OUT_Transpse->OUT_Reshape*
 - *Fuse this pattern to: SDPA*
- *This hotspot can be observed after
openvinotoolkit#26130, this PR's
implementation doesn't depend on it.*

### Tickets:
 - *153616*

---------

Signed-off-by: xipingya <[email protected]>
src/plugins/intel_cpu/src/nodes/memory.hpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.hpp Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.hpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
INTERNAL_OP_SCOPE(intel_cpu_ReadValueWithSubgraphNode_clone_with_new_inputs);

check_new_args_count(this, new_args);
auto op = std::make_shared<ov::intel_cpu::ReadValueWithSubgraph>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we pass the variable here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move some parameters to constructor of ReadValueWithSubgraph and add associated constructor.

…n the future, it may be deleted.

Signed-off-by: xipingya <[email protected]>
2: Adopt parent configuration, avoid to insert reorder before the MemoryInput.
3: move prepare param to runDynamic, because it is not called each time.
It is responsibility of MemoryInputSingle or MemoryOutput
Add: CPU_GRAPH_OPTIMIZER_SCOPE(DropRedundantMemoryOutput_SubGraph);
before create edge, call graph.RemoveEdge(parentEdge);

Signed-off-by: xipingya <[email protected]>
Update comments: // Flag: find Output node
This reverts commit 5d6c9de.

Because Whisper inference result is wrong.
It is very strange, subgraph output and state have same memory ptr, if I remove below code, whisper's result is wrong.

             auto& outputs = subGraph->GetOutputNodesMap();
             OPENVINO_ASSERT(outputs.size() == 1);
             auto itr = outputs.begin();
             src = itr->second->getSrcMemoryAtPort(0);

# Conflicts:
#	src/plugins/intel_cpu/src/nodes/memory.cpp
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.hpp Outdated Show resolved Hide resolved
Comment on lines 681 to 695
// for the output descriptors, use the configuration of the graph's output nodes
auto outputDescriptors = subGraph->getOutputMemoryDescriptors();

const auto& desc = outputDescriptors.front();

std::vector<PortConfig> outConfs;

outConfs.emplace_back(desc, BlockedMemoryDesc::FULL_MASK, 0); // use the memory from the first input inPlace

const NodeConfig config(inConfs, outConfs);

supportedPrimitiveDescriptors.clear();
supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::undef);

selectPrimitiveDescriptorByIndex(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we enforced the output memory descriptor of the subgraph to be the same as in the node, so we don't need to redefine the node config. The only thing we really need to do here is inserting a sanity check:

    // for the output descriptors, use the configuration of the graph's output nodes
    auto outputDescriptors = m_graph.getOutputMemoryDescriptors();

    const auto& desc = outputDescriptors.front();

    // just a sanity check
    CPU_NODE_ASSERT(desc->isCompatible(*(config.outConfs.front().getMemDesc())), "Unexpected node <-> subgraph output memory descriptor mismatch");

Just to make sure that the output memory descriptor has been really enforced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @maxnick , if we enforce memory descriptor here, the subGraph also need to re-init(https://github.com/xipingyan/openvino/blob/7060e852ef2ff532351b3e8421c127515322767f/src/plugins/intel_cpu/src/nodes/memory.cpp#L683).
Because subGraph->Init can only call once(second call can trigger crash).
So I removed MemoryInput::selectOptimalPrimitiveDescriptor(), and move then to here.

Copy link
Contributor

@maxnick maxnick Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to enforce it exactly here! It's already enforced in

        std::vector<Input::OutputConfig> graphOutputConfig;
        for (auto&& portConfig : config.outConfs) {
            auto desc = portConfig.getMemDesc();
            graphOutputConfig.emplace_back(node::Input::OutputConfig{desc, true});
        }

My comment was about exactly these lines of code

        std::vector<PortConfig> outConfs;

        outConfs.emplace_back(desc, BlockedMemoryDesc::FULL_MASK, 0);  // use the memory from the first input inPlace

        const NodeConfig config(inConfs, outConfs);

        supportedPrimitiveDescriptors.clear();
        supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::undef);

        selectPrimitiveDescriptorByIndex(0);

We mustn't call these methods, as they redefine the node port descriptors, but we don't need to redefine them at all, as the dependency is directed toward the subgraph, i.e. we enforce the existing node memory descriptors in the subgraph to allow memory substitution. That's it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, updated.

+graphOutputConfig.emplace_back(node::Input::OutputConfig{desc, true});
I don't why all test can pass without shape inference.
`InternalDynShapeInferFactory` seems like doing nothing.

Signed-off-by: xipingya <[email protected]>
…to replace my own implementation.

Signed-off-by: xipingya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants