-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPUW: Added conditional checks for accuracy for spatial subgraphs #27348
NPUW: Added conditional checks for accuracy for spatial subgraphs #27348
Conversation
ov::Tensor dst(ov::element::Type_t::f32, actual->get_shape()); | ||
ov::npuw::util::to_f32(ov::make_tensor(actual), dst); | ||
ov::Tensor dst(ov::element::Type_t::f32, in_actual.get_shape()); | ||
ov::npuw::util::to_f32(in_actual, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you anyways convert to f32, it could be easier to teach this to_f32
function work with strided tensors - at least for inputs.
May be worth a todo and a follow-up ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO, will create a ticket!
src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp
Outdated
Show resolved
Hide resolved
void ov::npuw::JustInferRequest::recreate_subrequests(std::size_t real_idx) { | ||
auto& comp_model_desc = m_npuw_model->m_compiled_submodels[real_idx]; | ||
NPUW_ASSERT(comp_model_desc.replaced_by.value_or(real_idx) == real_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the minimal fix here: 149d1d6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used it, thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so does it work now if runtime failure occurs in non-1st function block? Not talking about accuracy failover here.
@@ -1041,11 +1126,15 @@ void ov::npuw::JustInferRequest::unsafe_infer(std::size_t real_idx) { | |||
} | |||
} | |||
|
|||
void ov::npuw::JustInferRequest::unsafe_run_this_prep_next(std::size_t idx, bool& next_prepared) { | |||
void ov::npuw::JustInferRequest::unsafe_run_this_prep_next(std::size_t idx, bool& next_prepared, bool& failover) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note - this method is called unsafe_run_this_prep_next
. It didn't have any failover functionality before so it shouldn't have it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed failover
to accuracy_failover
so it will stay unsafe
in terms of usual failover
. I need accuracy_failover
at this level as unsafe_infer()
call r->infer()
multiple times for spatial subgraph, so it is easer to wrap just r->infer()
call with accuracy logic and each subrequest launch will be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's decouple those.
accuracy check must happen after the inference is complete.
I know it is more convenient for you to plugin right into the unsafe_infer()
but this code shouldn't be there.
Instead, where normally the accuracy check is happens, you need to run your reference subrequest over the same range.
You can borrow the range-based execution logic and generalize it so both unsafe_infer
and the accuracy check use the same flow - the accuracy check shouldn't be injected that deep into the normal infer.
9d014a2
to
2c4b3b6
Compare
…ed fix by Eugene Smirnov for device_it for funcall models. Added conditional log of launch
2c4b3b6
to
41cf373
Compare
- Switched from real failover to CPU subrequest to copy of CPU subrequest results back to NPU ones to avoid handling of all specifically allocated containers on NPU to work with CPU subrequests. - Refactored accuracy failover to present only failures in log_level=Error mode. - Fixed order of inputs in ilist.txt to be equal to order of model inputs in case of spatial subgraph. - Fixed dump of ilist.txt for different tiles & also added check to dump only valid ranges. - Added dumps for inaccurate subgraphs and their inputs.
41cf373
to
759a73a
Compare
@@ -377,8 +379,7 @@ ov::npuw::CompiledModel::CompiledModel(const std::shared_ptr<ov::Model>& model, | |||
} | |||
} | |||
|
|||
m_compiled_submodels[id].device_it = | |||
id != real_id ? m_compiled_submodels[real_id].device_it : m_dev_list.cbegin(); | |||
m_compiled_submodels[id].device_it = m_dev_list.cbegin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside compile()
lambda idx != real_idx
won't be passed
/** | ||
* @brief | ||
* Type: bool. | ||
* Enable dumps of materials for model(s), failing accuracy check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rephrase. What are the "materials"?
Also, as we operate at subgraph level, there's no "model(s)"
bool ov::npuw::metrics::NRMSE::operator()(const ov::SoPtr<ov::ITensor>& actual, | ||
const ov::SoPtr<ov::ITensor>& reference) const { | ||
NPUW_ASSERT(actual->is_continuous()); | ||
NPUW_ASSERT(reference->is_continuous()); | ||
const ov::SoPtr<ov::ITensor>& reference, | ||
double* result) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bool operator() (.., double *result);
thing looks weird. Just make a proper return type like Result
.
For those who were only checking true/false, that type could have operator bool()
.
namespace { | ||
void set_inputs(const ov::SoPtr<ov::IAsyncInferRequest>& from, ov::SoPtr<ov::IAsyncInferRequest>& to) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you didn't ran clang-format-fix stuff here.
std::stringstream create_launch_msg(std::size_t idx, std::size_t real_idx) { | ||
std::stringstream log_msg_stream; | ||
log_msg_stream << "Launching subrequest[" << idx << "]" << | ||
((real_idx == idx) ? std::string("...").c_str() : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you clearly don't need .c_str()
here.
std::string(std::string(", which is actually subrequest[") + | ||
std::to_string(real_idx) + "]").c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here too. Why?
} | ||
} | ||
|
||
std::stringstream create_launch_msg(std::size_t idx, std::size_t real_idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you need this at all. just write one more line in the log if you want to add smt.
std::stringstream log_msg_stream = create_launch_msg(subidx, real_subidx); | ||
if (m_npuw_model->m_compiled_submodels[real_subidx].spatial && len != 0) { | ||
log_msg_stream << ", on range : [" << offset << ", " << offset + len << ")"; | ||
} | ||
log_msg_stream << "..."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just write multiple log lines without overdesign.
LOG_DEBUG("Your message here");
if (spatial) {
LOG_BLOCK();
LOG_DEBUG("Your spatial info here");
}
Note - we only write runtime log in DEBUG. It should never be INFO here.
LOG_BLOCK(); | ||
|
||
if (m_npuw_model->m_compiled_submodels[real_subidx].switched_to_ref) { | ||
LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG_INFO("Subrequest was inaccurate somewhere before, launching it on reference device."); | |
LOG_INFO("Subrequest was inaccurate somewhere before, switching it to the reference device " << device); |
void ov::npuw::IBaseInferRequest::try_accurate_subinfer(std::size_t subidx, std::size_t offset, | ||
std::size_t len, bool& accuracy_failover) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you passing offset/length here only for printing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stopped my review at this file. Please make accuracy check integration less invasive. It shouldn't be more than it was before.
This PR will be closed in a week because of 2 weeks of no activity. |
This PR was closed because it has been stalled for 2 week with no activity. |
Details:
device_it
for funcall modelsrecreate_subrequest()
that was called not onreal_idx
and failed failover (not accuracy).Note: usual failover wasn't tested yet on spatial subgraphs, so might cause issues. - WIP
Tickets: