NPUW: Added conditional checks for accuracy for spatial subgraphs #27348

AsyaPronina · 2024-10-31T02:32:29Z

Details:

Added fix for issue found by Eugene Smirnov for device_it for funcall models
Added and then updated with changes from Dmitry Matveev fix for recreate_subrequest() that was called not on real_idx and failed failover (not accuracy).
Note: usual failover wasn't tested yet on spatial subgraphs, so might cause issues. - WIP
Switched from real failover to CPU subrequest to copy of CPU subrequest results back to NPU ones to avoid handling of all specifically allocated containers on NPU to work with CPU subrequests.
Refactored accuracy failover to present only failures in log_level=Error mode.
Fixed order of inputs in ilist.txt to be equal to order of model inputs in case of spatial subgraph.
Fixed dump of ilist.txt for different tiles & also added check to dump only valid ranges.
Added dumps for inaccurate subgraphs and their inputs.

Tickets:

ticket-id

src/plugins/intel_npu/src/plugin/npuw/accuracy/comparator.cpp

dmatveev · 2024-10-31T17:25:23Z

src/plugins/intel_npu/src/plugin/npuw/accuracy/comparator.cpp

-        ov::Tensor dst(ov::element::Type_t::f32, actual->get_shape());
-        ov::npuw::util::to_f32(ov::make_tensor(actual), dst);
+        ov::Tensor dst(ov::element::Type_t::f32, in_actual.get_shape());
+        ov::npuw::util::to_f32(in_actual, dst);


since you anyways convert to f32, it could be easier to teach this to_f32 function work with strided tensors - at least for inputs.
May be worth a todo and a follow-up ticket.

Added TODO, will create a ticket!

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp

src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp

dmatveev · 2024-10-31T17:42:12Z

src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp

+void ov::npuw::JustInferRequest::recreate_subrequests(std::size_t real_idx) {
+    auto& comp_model_desc = m_npuw_model->m_compiled_submodels[real_idx];
+    NPUW_ASSERT(comp_model_desc.replaced_by.value_or(real_idx) == real_idx);


Please refer to the minimal fix here: 149d1d6

Used it, thanks a lot!

so does it work now if runtime failure occurs in non-1st function block? Not talking about accuracy failover here.

dmatveev · 2024-10-31T17:44:12Z

src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp

@@ -1041,11 +1126,15 @@ void ov::npuw::JustInferRequest::unsafe_infer(std::size_t real_idx) {
    }
 }

-void ov::npuw::JustInferRequest::unsafe_run_this_prep_next(std::size_t idx, bool& next_prepared) {
+void ov::npuw::JustInferRequest::unsafe_run_this_prep_next(std::size_t idx, bool& next_prepared, bool& failover) {


note - this method is called unsafe_run_this_prep_next. It didn't have any failover functionality before so it shouldn't have it now.

I renamed failover to accuracy_failover so it will stay unsafe in terms of usual failover. I need accuracy_failover at this level as unsafe_infer() call r->infer() multiple times for spatial subgraph, so it is easer to wrap just r->infer() call with accuracy logic and each subrequest launch will be checked.

let's decouple those.

accuracy check must happen after the inference is complete.

I know it is more convenient for you to plugin right into the unsafe_infer() but this code shouldn't be there.

Instead, where normally the accuracy check is happens, you need to run your reference subrequest over the same range.

You can borrow the range-based execution logic and generalize it so both unsafe_infer and the accuracy check use the same flow - the accuracy check shouldn't be injected that deep into the normal infer.

…ed fix by Eugene Smirnov for device_it for funcall models. Added conditional log of launch

- Switched from real failover to CPU subrequest to copy of CPU subrequest results back to NPU ones to avoid handling of all specifically allocated containers on NPU to work with CPU subrequests. - Refactored accuracy failover to present only failures in log_level=Error mode. - Fixed order of inputs in ilist.txt to be equal to order of model inputs in case of spatial subgraph. - Fixed dump of ilist.txt for different tiles & also added check to dump only valid ranges. - Added dumps for inaccurate subgraphs and their inputs.

AsyaPronina · 2024-11-13T00:17:35Z

src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp

@@ -377,8 +379,7 @@ ov::npuw::CompiledModel::CompiledModel(const std::shared_ptr<ov::Model>& model,
            }
        }

-        m_compiled_submodels[id].device_it =
-            id != real_id ? m_compiled_submodels[real_id].device_it : m_dev_list.cbegin();
+        m_compiled_submodels[id].device_it = m_dev_list.cbegin();


inside compile() lambda idx != real_idx won't be passed

dmatveev · 2024-11-13T12:08:56Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

+/**
+ * @brief
+ * Type: bool.
+ * Enable dumps of materials for model(s), failing accuracy check.


Please rephrase. What are the "materials"?
Also, as we operate at subgraph level, there's no "model(s)"

dmatveev · 2024-11-13T12:11:56Z

src/plugins/intel_npu/src/plugin/npuw/accuracy/comparator.cpp

 bool ov::npuw::metrics::NRMSE::operator()(const ov::SoPtr<ov::ITensor>& actual,
-                                          const ov::SoPtr<ov::ITensor>& reference) const {
-    NPUW_ASSERT(actual->is_continuous());
-    NPUW_ASSERT(reference->is_continuous());
+                                          const ov::SoPtr<ov::ITensor>& reference,
+                                          double* result) const {


This bool operator() (.., double *result); thing looks weird. Just make a proper return type like Result.

For those who were only checking true/false, that type could have operator bool().

dmatveev · 2024-11-13T12:13:01Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+namespace {
+    void set_inputs(const ov::SoPtr<ov::IAsyncInferRequest>& from, ov::SoPtr<ov::IAsyncInferRequest>& to) {


I believe you didn't ran clang-format-fix stuff here.

dmatveev · 2024-11-13T12:16:28Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+    std::stringstream create_launch_msg(std::size_t idx,  std::size_t real_idx) {
+        std::stringstream log_msg_stream;
+        log_msg_stream << "Launching subrequest[" << idx << "]" <<
+        ((real_idx == idx) ? std::string("...").c_str() :


you clearly don't need .c_str() here.

dmatveev · 2024-11-13T12:16:39Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+                             std::string(std::string(", which is actually subrequest[") +
+                                std::to_string(real_idx) + "]").c_str());


and here too. Why?

dmatveev · 2024-11-13T12:17:49Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+        }
+    }
+
+    std::stringstream create_launch_msg(std::size_t idx,  std::size_t real_idx) {


not sure if you need this at all. just write one more line in the log if you want to add smt.

dmatveev · 2024-11-13T12:19:58Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+    std::stringstream log_msg_stream = create_launch_msg(subidx, real_subidx);
+    if (m_npuw_model->m_compiled_submodels[real_subidx].spatial && len != 0) {
+        log_msg_stream << ", on range : [" << offset << ", " << offset + len << ")";
+    }
+    log_msg_stream << "...";


Just write multiple log lines without overdesign.

LOG_DEBUG("Your message here"); if (spatial) { LOG_BLOCK(); LOG_DEBUG("Your spatial info here"); }

Note - we only write runtime log in DEBUG. It should never be INFO here.

dmatveev · 2024-11-13T12:20:52Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+    LOG_BLOCK();
+
+    if (m_npuw_model->m_compiled_submodels[real_subidx].switched_to_ref) {
+        LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device.");


Suggested change

LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device.");

LOG_INFO("Subrequest was inaccurate somewhere before, switching it to the reference device " << device);

dmatveev · 2024-11-13T12:21:34Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

+void ov::npuw::IBaseInferRequest::try_accurate_subinfer(std::size_t subidx, std::size_t offset,
+                                                        std::size_t len, bool& accuracy_failover) {


are you passing offset/length here only for printing?

dmatveev · 2024-11-13T12:22:51Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp

Stopped my review at this file. Please make accuracy check integration less invasive. It shouldn't be more than it was before.

github-actions · 2024-12-08T00:27:42Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2024-12-18T00:24:29Z

This PR was closed because it has been stalled for 2 week with no activity.

AsyaPronina requested review from a team as code owners October 31, 2024 02:32

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 31, 2024

AsyaPronina marked this pull request as draft October 31, 2024 02:34

dmatveev changed the title ~~Added conditional checks for accuracy for spatial subgraphs~~ NPUW: Added conditional checks for accuracy for spatial subgraphs Oct 31, 2024

dmatveev reviewed Oct 31, 2024

View reviewed changes

AsyaPronina force-pushed the accuracy_mode_for_spatial_subgraphs branch from 9d014a2 to 2c4b3b6 Compare November 12, 2024 23:06

AsyaPronina and others added 2 commits November 12, 2024 23:06

Added conditional checks for accuracy for spatial subgraphs also. Add…

2b1d2e8

…ed fix by Eugene Smirnov for device_it for funcall models. Added conditional log of launch

NPUW: Fix idx != real_idx assert in failover

bb85745

AsyaPronina force-pushed the accuracy_mode_for_spatial_subgraphs branch from 2c4b3b6 to 41cf373 Compare November 12, 2024 23:07

AsyaPronina marked this pull request as ready for review November 12, 2024 23:08

AsyaPronina force-pushed the accuracy_mode_for_spatial_subgraphs branch from 41cf373 to 759a73a Compare November 12, 2024 23:24

AsyaPronina commented Nov 13, 2024

View reviewed changes

dmatveev requested changes Nov 13, 2024

View reviewed changes

github-actions bot added the Stale label Dec 8, 2024

github-actions bot closed this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPUW: Added conditional checks for accuracy for spatial subgraphs #27348

NPUW: Added conditional checks for accuracy for spatial subgraphs #27348

AsyaPronina commented Oct 31, 2024 •

edited

Loading

dmatveev Oct 31, 2024

AsyaPronina Nov 12, 2024

dmatveev Oct 31, 2024

AsyaPronina Nov 12, 2024

dmatveev Nov 13, 2024

dmatveev Oct 31, 2024

AsyaPronina Nov 12, 2024 •

edited

Loading

dmatveev Nov 13, 2024 •

edited

Loading

AsyaPronina Nov 13, 2024 •

edited

Loading

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

github-actions bot commented Dec 8, 2024

github-actions bot commented Dec 18, 2024

		namespace {
		void set_inputs(const ov::SoPtr<ov::IAsyncInferRequest>& from, ov::SoPtr<ov::IAsyncInferRequest>& to) {

		std::string(std::string(", which is actually subrequest[") +
		std::to_string(real_idx) + "]").c_str());

	LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device.");
	LOG_INFO("Subrequest was inaccurate somewhere before, switching it to the reference device " << device);

		void ov::npuw::IBaseInferRequest::try_accurate_subinfer(std::size_t subidx, std::size_t offset,
		std::size_t len, bool& accuracy_failover) {

NPUW: Added conditional checks for accuracy for spatial subgraphs #27348

NPUW: Added conditional checks for accuracy for spatial subgraphs #27348

Conversation

AsyaPronina commented Oct 31, 2024 • edited Loading

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AsyaPronina Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

dmatveev Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

AsyaPronina Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 8, 2024

github-actions bot commented Dec 18, 2024

AsyaPronina commented Oct 31, 2024 •

edited

Loading

AsyaPronina Nov 12, 2024 •

edited

Loading

dmatveev Nov 13, 2024 •

edited

Loading

AsyaPronina Nov 13, 2024 •

edited

Loading