-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix request re-use when cancelling a request #253
Conversation
src/infer_request.h
Outdated
bool IsCancelled() { return response_factory_->IsCancelled(); } | ||
Status IsCancelled(bool* is_cancelled) | ||
{ | ||
if (response_factory_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: in previous iteration - we would have had an issue if user called cancelled before setresponsecallback - is that correct?
Question: can user call inferasync without calling setresponse callback? does that error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in previous iteration - we would have had an issue if user called cancelled before setresponsecallback - is that correct?
that's correct.
Question: can user call inferasync without calling setresponse callback? does that error?
I think if it is not provided it would segfault when the backend tries to send responses: https://github.com/triton-inference-server/core/blob/a23786c25b36f07d2cfd510821682c117b50a99e/src/infer_response.cc#L219C11-L223
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it is not provided it would segfault when the backend tries to send responses: https://github.com/triton-inference-server/core/blob/a23786c25b36f07d2cfd510821682c117b50a99e/src/infer_response.cc#L219C11-L223
Maybe low hanging fruit for the future, but we could catch that error and return it for some improved UX instead of segfaulting for C-API and bindings users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetResponseCallback
is part of the request initialization, the server assumes the request is properly initialized when InferAsync
is called.
I think we can draw the separation between before and after InferAsync
, only after the call, infer request is passed into Triton realm. Before that, the user / frontend has all the control of infer request and calling Cancel / IsCancelled
doesn't have real meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a few questions but only for clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT, just need to address the other comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean 🚀
src/backend_model_instance.cc
Outdated
// For warmup requests we need to manually set ResponseFactory | ||
// since they modify the callback after PrepareForInference has | ||
// been called. | ||
request->SetResponseFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider moving above the requests.push_back() for consistency with other request setup calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved.
6196b75
to
2e2a617
Compare
* Fix state transitions for re-running requests (#251) * Add backend/server APIs * Implement the cancellation APIs * Only store the state in response factory * Add unit testing for request cancellation * Add test * Add cancellation status * Add testing for cancelling a request after release * Handle request re-use * Enable request reuse test * Add staged changes * Add temporary fix for the request state bug --------- Co-authored-by: Ryan McCormick <[email protected]>
d4ebc5a
to
0e6dc34
Compare
* Add inference request cancellation APIs (#249) * Fix state transitions for re-running requests (#251) * Add backend/server APIs * Implement the cancellation APIs * Only store the state in response factory * Add unit testing for request cancellation * Add test * Add cancellation status * Add testing for cancelling a request after release * Handle request re-use * Enable request reuse test * Add staged changes * Add temporary fix for the request state bug --------- Co-authored-by: Ryan McCormick <[email protected]> * Fix request re-use when cancelling a request (#253) * Add inference request cancellation APIs (#249) * Fix state transitions for re-running requests (#251) * Add backend/server APIs * Implement the cancellation APIs * Only store the state in response factory * Add unit testing for request cancellation * Add test * Add cancellation status * Add testing for cancelling a request after release * Handle request re-use * Enable request reuse test * Add staged changes * Add temporary fix for the request state bug --------- Co-authored-by: Ryan McCormick <[email protected]> * Fix request re-use when cancelling a request * Review edit * Fix warmup request * Fix null requests * Review edit --------- Co-authored-by: Ryan McCormick <[email protected]> * Dynamic batch scheduler request cancellation (#257) * Add adapter for IsCancelled() * Move cancel functions outside TRITON_ENABLE_STATS * Add request cancellation to dynamic batch scheduler * Refactor cancelled requests to use rejected requests routine * Remove shared pointer wrapper for rejected and cancelled requests * Ensemble scheduler request cancellation (#263) * Add adapter for IsCancelled() * Move cancel functions outside TRITON_ENABLE_STATS * Add request cancellation to ensemble scheduler * Sequence batch scheduler request cancellation (#260) * Add request cancellation to sequence batch scheduler * Add request cancellation to in-flight sequences * Refactor on how a request is cancelled * Fix issue when request is timeout dummy * Always mark request cancelled before cancelling * Mark immutable static status as const --------- Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]>
When cancelling a request that is being re-used, we need to make sure that a new response factory is created for each inference. Otherwise, in the old code, if the user didn't call
SetResponseReleaseCallback
, cancelling a new request could cancel the old response factory as well which is not desired.