Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPUW: Added conditional checks for accuracy for spatial subgraphs #27348

Conversation

AsyaPronina
Copy link
Contributor

@AsyaPronina AsyaPronina commented Oct 31, 2024

Details:

  • Added fix for issue found by Eugene Smirnov for device_it for funcall models
  • Added and then updated with changes from Dmitry Matveev fix for recreate_subrequest() that was called not on real_idx and failed failover (not accuracy).
    Note: usual failover wasn't tested yet on spatial subgraphs, so might cause issues. - WIP
  • Switched from real failover to CPU subrequest to copy of CPU subrequest results back to NPU ones to avoid handling of all specifically allocated containers on NPU to work with CPU subrequests.
  • Refactored accuracy failover to present only failures in log_level=Error mode.
  • Fixed order of inputs in ilist.txt to be equal to order of model inputs in case of spatial subgraph.
  • Fixed dump of ilist.txt for different tiles & also added check to dump only valid ranges.
  • Added dumps for inaccurate subgraphs and their inputs.

Tickets:

  • ticket-id

@AsyaPronina AsyaPronina requested review from a team as code owners October 31, 2024 02:32
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 31, 2024
@AsyaPronina AsyaPronina marked this pull request as draft October 31, 2024 02:34
@dmatveev dmatveev changed the title Added conditional checks for accuracy for spatial subgraphs NPUW: Added conditional checks for accuracy for spatial subgraphs Oct 31, 2024
ov::Tensor dst(ov::element::Type_t::f32, actual->get_shape());
ov::npuw::util::to_f32(ov::make_tensor(actual), dst);
ov::Tensor dst(ov::element::Type_t::f32, in_actual.get_shape());
ov::npuw::util::to_f32(in_actual, dst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you anyways convert to f32, it could be easier to teach this to_f32 function work with strided tensors - at least for inputs.
May be worth a todo and a follow-up ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO, will create a ticket!

src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp Outdated Show resolved Hide resolved
src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp Outdated Show resolved Hide resolved
Comment on lines 895 to 897
void ov::npuw::JustInferRequest::recreate_subrequests(std::size_t real_idx) {
auto& comp_model_desc = m_npuw_model->m_compiled_submodels[real_idx];
NPUW_ASSERT(comp_model_desc.replaced_by.value_or(real_idx) == real_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the minimal fix here: 149d1d6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used it, thanks a lot!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so does it work now if runtime failure occurs in non-1st function block? Not talking about accuracy failover here.

@@ -1041,11 +1126,15 @@ void ov::npuw::JustInferRequest::unsafe_infer(std::size_t real_idx) {
}
}

void ov::npuw::JustInferRequest::unsafe_run_this_prep_next(std::size_t idx, bool& next_prepared) {
void ov::npuw::JustInferRequest::unsafe_run_this_prep_next(std::size_t idx, bool& next_prepared, bool& failover) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note - this method is called unsafe_run_this_prep_next. It didn't have any failover functionality before so it shouldn't have it now.

Copy link
Contributor Author

@AsyaPronina AsyaPronina Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed failover to accuracy_failover so it will stay unsafe in terms of usual failover. I need accuracy_failover at this level as unsafe_infer() call r->infer() multiple times for spatial subgraph, so it is easer to wrap just r->infer() call with accuracy logic and each subrequest launch will be checked.

Copy link
Contributor

@dmatveev dmatveev Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's decouple those.

accuracy check must happen after the inference is complete.

I know it is more convenient for you to plugin right into the unsafe_infer() but this code shouldn't be there.

Instead, where normally the accuracy check is happens, you need to run your reference subrequest over the same range.

You can borrow the range-based execution logic and generalize it so both unsafe_infer and the accuracy check use the same flow - the accuracy check shouldn't be injected that deep into the normal infer.

@AsyaPronina AsyaPronina force-pushed the accuracy_mode_for_spatial_subgraphs branch from 9d014a2 to 2c4b3b6 Compare November 12, 2024 23:06
AsyaPronina and others added 2 commits November 12, 2024 23:06
…ed fix by Eugene Smirnov for device_it for funcall models. Added conditional log of launch
@AsyaPronina AsyaPronina force-pushed the accuracy_mode_for_spatial_subgraphs branch from 2c4b3b6 to 41cf373 Compare November 12, 2024 23:07
@AsyaPronina AsyaPronina marked this pull request as ready for review November 12, 2024 23:08
- Switched from real failover to CPU subrequest to copy of CPU
  subrequest results back to NPU ones to avoid handling of all
  specifically allocated containers on NPU to work with CPU subrequests.
- Refactored accuracy failover to present only failures in log_level=Error
  mode.
- Fixed order of inputs in ilist.txt to be equal to order of model
  inputs in case of spatial subgraph.
- Fixed dump of ilist.txt for different tiles & also added check to dump
  only valid ranges.
- Added dumps for inaccurate subgraphs and their inputs.
@AsyaPronina AsyaPronina force-pushed the accuracy_mode_for_spatial_subgraphs branch from 41cf373 to 759a73a Compare November 12, 2024 23:24
@@ -377,8 +379,7 @@ ov::npuw::CompiledModel::CompiledModel(const std::shared_ptr<ov::Model>& model,
}
}

m_compiled_submodels[id].device_it =
id != real_id ? m_compiled_submodels[real_id].device_it : m_dev_list.cbegin();
m_compiled_submodels[id].device_it = m_dev_list.cbegin();
Copy link
Contributor Author

@AsyaPronina AsyaPronina Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside compile() lambda idx != real_idx won't be passed

/**
* @brief
* Type: bool.
* Enable dumps of materials for model(s), failing accuracy check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase. What are the "materials"?
Also, as we operate at subgraph level, there's no "model(s)"

Comment on lines 15 to +17
bool ov::npuw::metrics::NRMSE::operator()(const ov::SoPtr<ov::ITensor>& actual,
const ov::SoPtr<ov::ITensor>& reference) const {
NPUW_ASSERT(actual->is_continuous());
NPUW_ASSERT(reference->is_continuous());
const ov::SoPtr<ov::ITensor>& reference,
double* result) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bool operator() (.., double *result); thing looks weird. Just make a proper return type like Result.

For those who were only checking true/false, that type could have operator bool().

Comment on lines +101 to +102
namespace {
void set_inputs(const ov::SoPtr<ov::IAsyncInferRequest>& from, ov::SoPtr<ov::IAsyncInferRequest>& to) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you didn't ran clang-format-fix stuff here.

std::stringstream create_launch_msg(std::size_t idx, std::size_t real_idx) {
std::stringstream log_msg_stream;
log_msg_stream << "Launching subrequest[" << idx << "]" <<
((real_idx == idx) ? std::string("...").c_str() :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you clearly don't need .c_str() here.

Comment on lines +125 to +126
std::string(std::string(", which is actually subrequest[") +
std::to_string(real_idx) + "]").c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here too. Why?

}
}

std::stringstream create_launch_msg(std::size_t idx, std::size_t real_idx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if you need this at all. just write one more line in the log if you want to add smt.

Comment on lines +140 to +144
std::stringstream log_msg_stream = create_launch_msg(subidx, real_subidx);
if (m_npuw_model->m_compiled_submodels[real_subidx].spatial && len != 0) {
log_msg_stream << ", on range : [" << offset << ", " << offset + len << ")";
}
log_msg_stream << "...";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just write multiple log lines without overdesign.

LOG_DEBUG("Your message here");
if (spatial) {
    LOG_BLOCK();
    LOG_DEBUG("Your spatial info here");
}

Note - we only write runtime log in DEBUG. It should never be INFO here.

LOG_BLOCK();

if (m_npuw_model->m_compiled_submodels[real_subidx].switched_to_ref) {
LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device.");
LOG_INFO("Subrequest was inaccurate somewhere before, switching it to the reference device " << device);

Comment on lines +131 to +132
void ov::npuw::IBaseInferRequest::try_accurate_subinfer(std::size_t subidx, std::size_t offset,
std::size_t len, bool& accuracy_failover) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you passing offset/length here only for printing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped my review at this file. Please make accuracy check integration less invasive. It shouldn't be more than it was before.

Copy link
Contributor

github-actions bot commented Dec 8, 2024

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Dec 8, 2024
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants