[CPU] Refactor memory control and allocation #27259

EgorDuplensky · 2024-10-25T21:35:07Z

Details:

All the nested graphs are now can be a part of a global memory reuse logic
Code changes are required to enable global memory reuse for a node with a nested graph
LoRa and Composite nodes have been updated to support global memory reuse
Temporary property has been added to the GraphContext to propagate global memory reuse
though the nesting levels. If a node does not support global memory reuse (i.e. If operation) then global memory reuse is disabled for all the nested graphs of than node).
This allows to have both types of the nodes with a subgraph - the updated and not updated ones - at the same time in a single graph.

Tickets:

ticket-id

maxnick · 2024-11-06T14:04:00Z

src/plugins/intel_cpu/src/compiled_model.cpp

+                                                         memoryControl,
+                                                         m_networkMemoryControl,


Since the memoryControl is derived from m_networkMemoryControl, may be it's just enough to pass only m_networkMemoryControl to the context?

The current idea is that passing m_networkMemoryControl is now obsolete and should be dropped after all the nodes with inner graphs are updated according to the memory reuse changes. And all the nodes are supposed to use a particular memoryControl instance created by CompiledModel and not some random one from networkMemoryControl.

maxnick · 2024-11-06T14:13:36Z

src/plugins/intel_cpu/src/compiled_model.h

+    std::shared_ptr<NetworkMemoryControl> get_network_memory_control() const {
+        return m_networkMemoryControl;
+    }


Looks like this method is not used.

maxnick · 2024-11-06T14:17:51Z

src/plugins/intel_cpu/src/edge.h

@@ -82,6 +82,7 @@ class Edge {
    }

    std::string name() const;
+    const MemoryDesc& getDesc() const;


As far as I remember this, this method was hidden from the public domain on purpose.
Making this method public, introduces two separate ways to access memory descriptor:

Via the memory Object

Via this method.
And it becomes confusing for the node developer - which path should be used in which context. I would propose to revise the purpose of moving this method to public section and try to avoid this change.

The PR is not really ready for review.
This change is temporary, just to enable functionality.

maxnick · 2024-11-06T14:19:38Z

src/plugins/intel_cpu/src/graph.h

+struct MemoryRegion {
+    int start;     // Execution order index of first use.
+    int finish;    // Execution order index of last use. -1 means inf
+    int64_t size;  // size in bytes
+    int64_t id;    // ID unique for each region
+
+    enum class RegionType : uint8_t { VARIABLE, CONSTANT, INPUT, OUTPUT, IO } type;
+    enum class AllocType : uint8_t { POD, STRING, UNKNOWN } alloc_type;
+};


This structure should be a part of the memory management subsystem as it's the problem description for the mem management. What is the reason behind moving this structure to the graph header?

just a tmp change, will be reverted

maxnick · 2024-11-06T14:31:55Z

src/plugins/intel_cpu/src/graph.cpp

 void Graph::Activate(const std::vector<MemoryPtr>& externalInputMemory,
-                               const std::vector<MemoryPtr>& externalOutputMemory) {
-    OPENVINO_ASSERT(status == Status::Initialized, "Invalid graph status");
+                     const std::vector<MemoryPtr>& externalOutputMemory,
+                     bool globalAllocation) {


Again, a specific flag to indicate the global status. May be this is an another indicator of introducing a Subgraph class as a derivative of the cpu Graph?

Please check the updated documentation.
The flag is necessary to allow the nodes with inner graphs which are not updated yet, to use local memory control unit, as it is currently done on master.
Supposed to be dropped after all the nodes are updated.

maxnick · 2024-11-06T15:42:48Z

src/plugins/intel_cpu/src/graph.cpp

+    if (memoryControl->allocated()) {
+        // std::cout << "Memory is already allocated for a subgraph: " << _name << "\n";
+        return;
+    }
+


Could you please shed some lite on the purpose of this check? Isn't it an unexpected situation that a memory control is in the allocated state even though the Allocate is called once?

This is the way to keep memory allocation procedure generic across all the graphs.
So, no graphs are unique (i.e. outer graph or subgraphs)
The idea is that the first graph which is being "Activated" is responsible to allocate the memory for all the "context" it has, which includes all the subgraphs.
Basically, it will always be the outer graph which actually does this.
I am not 100% satisfied with this approach to be honest, but on the other hand it does make sense.

maxnick · 2024-11-06T16:39:52Z

src/plugins/intel_cpu/src/infer_request.cpp

+    MemoryControl* network_memory_control = m_graph->getGraphContext()->getMemoryControl();
+    if (!network_memory_control) {
+        OPENVINO_THROW("Memory control unit is not initilized for graph: ", m_graph->GetName());
+    }
+
+    if (!network_memory_control->allocated()) {
+        network_memory_control->allocateMemory();
+    }


From the graph usage perspective, this action is not that obvious at all. So this is just an another example of implicit coupling between the infer request, graph, and memory control subsystems. Suppose we would like to develop yet another implementation of an infer request, or run the graph in other context (other than the infer request). How do we understand that we have to retrieve the memory subsystem, check the allocation status and call allocate?

I don't really remember why I moved it out the Graph, but I think it can be reverted.
From the other perspective we kind of know that we don't have to perform this check for the SubGraphs, so it kind of make sense to move the check out of the Graph logic.

maxnick · 2024-11-06T17:19:01Z

src/plugins/intel_cpu/src/memory_control.hpp

+    static EdgeClusters formEdgeClusters(const std::vector<EdgePtr>& graphEdges);
+    static MemoryRegions formMemoryRegions(const EdgeClusters& clusters, size_t remaining, const GlobalExecutionIndex& globalExecIndex);
+    static OutputMemoryBlocks filterOutDynamicOutputEdges(MemoryRegions& memoryRegions,
+                                                                const EdgeClusters& clusters,
+                                                                const std::map<std::size_t, NodePtr>& outputNodes);


Isn't it better to put these utility methods to some place other than the MemoryControl class to keep the latter plugin independent?

Agree.
I am going to move it back to the graph

maxnick · 2024-11-06T17:29:21Z

src/plugins/intel_cpu/src/memory_control.hpp

+    // @todo return std::reference_wrapper instead?
+    MemoryControl* createMemoryControlUnit();


Why can't we simply return a reference?

We can
Just do not like the idea of storing a plain reference inside the GraphContext

maxnick · 2024-11-06T17:44:03Z

src/plugins/intel_cpu/src/node.h

+    virtual bool canBeSkipped() const {
+        return getSelectedPrimitiveDescriptor()->hasZeroInputDims();
+    }


May be change the name to isExecutableStatic and rename the existing isExecutable -> isExecutableDynamic in the analogy with executeStatic/Dynamic? What do you think?

The naming is hard in this case.
My thoughts are:

We are trying to perform an optimization, when we completely drop a node from the execution graph, because we know it will never be executed. The good name for such check would be actually "isExecutable", meaning "is executable at all?"

We are trying to check, whether the node execution can be skipped during the inference, because it gets zerodim input shapes or maybe it became inplace, when we will have dynamic inplace. This check could have a name "shouldBeSkipped", "mustBeSkipped" or "isDynamicallyExecutable().

EgorDuplensky · 2024-11-08T15:56:28Z

Some refactoring and code the clean-ups can still be expected

EgorDuplensky · 2024-11-12T15:48:13Z

The same changes + adaptations for If and TensorIterator node will be done in scope of:

[CPU] Enable memory reuse for nested graphs #27521

EgorDuplensky requested review from a team as code owners October 25, 2024 21:35

github-actions bot added the category: CPU OpenVINO CPU plugin label Oct 25, 2024

EgorDuplensky force-pushed the enable_memory_reuse_across_inner_graphs branch 4 times, most recently from 28be812 to 9311b5e Compare October 31, 2024 15:32

EgorDuplensky requested a review from a team as a code owner October 31, 2024 15:32

github-actions bot added category: inference OpenVINO Runtime library - Inference category: build OpenVINO cmake script / infra labels Oct 31, 2024

EgorDuplensky force-pushed the enable_memory_reuse_across_inner_graphs branch 13 times, most recently from 5d0ea7f to 8d45629 Compare November 6, 2024 09:01

maxnick self-assigned this Nov 6, 2024

maxnick reviewed Nov 6, 2024

View reviewed changes

EgorDuplensky force-pushed the enable_memory_reuse_across_inner_graphs branch from 8d45629 to e4812a2 Compare November 9, 2024 13:01

[CPU] Refactor memory control and allocation

ba4ccff

EgorDuplensky force-pushed the enable_memory_reuse_across_inner_graphs branch from e4812a2 to ba4ccff Compare November 11, 2024 16:17

EgorDuplensky closed this Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Refactor memory control and allocation #27259

[CPU] Refactor memory control and allocation #27259

EgorDuplensky commented Oct 25, 2024 •

edited

Loading

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024 •

edited

Loading

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024

maxnick Nov 6, 2024

EgorDuplensky Nov 8, 2024 •

edited

Loading

EgorDuplensky commented Nov 8, 2024

EgorDuplensky commented Nov 12, 2024

		// @todo return std::reference_wrapper instead?
		MemoryControl* createMemoryControlUnit();

[CPU] Refactor memory control and allocation #27259

[CPU] Refactor memory control and allocation #27259

Conversation

EgorDuplensky commented Oct 25, 2024 • edited Loading

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorDuplensky Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorDuplensky Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

EgorDuplensky commented Nov 8, 2024

EgorDuplensky commented Nov 12, 2024

EgorDuplensky commented Oct 25, 2024 •

edited

Loading

EgorDuplensky Nov 8, 2024 •

edited

Loading

EgorDuplensky Nov 8, 2024 •

edited

Loading