Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix request re-use when cancelling a request #253

Merged
merged 6 commits into from
Sep 13, 2023

Conversation

Tabrizian
Copy link
Member

When cancelling a request that is being re-used, we need to make sure that a new response factory is created for each inference. Otherwise, in the old code, if the user didn't call SetResponseReleaseCallback, cancelling a new request could cancel the old response factory as well which is not desired.

src/backend_model.cc Outdated Show resolved Hide resolved
bool IsCancelled() { return response_factory_->IsCancelled(); }
Status IsCancelled(bool* is_cancelled)
{
if (response_factory_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: in previous iteration - we would have had an issue if user called cancelled before setresponsecallback - is that correct?

Question: can user call inferasync without calling setresponse callback? does that error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in previous iteration - we would have had an issue if user called cancelled before setresponsecallback - is that correct?

that's correct.

Question: can user call inferasync without calling setresponse callback? does that error?

I think if it is not provided it would segfault when the backend tries to send responses: https://github.com/triton-inference-server/core/blob/a23786c25b36f07d2cfd510821682c117b50a99e/src/infer_response.cc#L219C11-L223

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it is not provided it would segfault when the backend tries to send responses: https://github.com/triton-inference-server/core/blob/a23786c25b36f07d2cfd510821682c117b50a99e/src/infer_response.cc#L219C11-L223

Maybe low hanging fruit for the future, but we could catch that error and return it for some improved UX instead of segfaulting for C-API and bindings users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetResponseCallback is part of the request initialization, the server assumes the request is properly initialized when InferAsync is called.

I think we can draw the separation between before and after InferAsync, only after the call, infer request is passed into Triton realm. Before that, the user / frontend has all the control of infer request and calling Cancel / IsCancelled doesn't have real meaning.

Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a few questions but only for clarification.

Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT, just need to address the other comments

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean 🚀

// For warmup requests we need to manually set ResponseFactory
// since they modify the callback after PrepareForInference has
// been called.
request->SetResponseFactory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving above the requests.push_back() for consistency with other request setup calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved.

Tabrizian and others added 6 commits September 13, 2023 10:15
* Fix state transitions for re-running requests (#251)

* Add backend/server APIs

* Implement the cancellation APIs

* Only store the state in response factory

* Add unit testing for request cancellation

* Add test

* Add cancellation status

* Add testing for cancelling a request after release

* Handle request re-use

* Enable request reuse test

* Add staged changes

* Add temporary fix for the request state bug

---------

Co-authored-by: Ryan McCormick <[email protected]>
@Tabrizian Tabrizian merged commit 7d2ab56 into request-cancellation Sep 13, 2023
1 check passed
@Tabrizian Tabrizian deleted the imant-fix-reuse branch September 13, 2023 18:58
tanmayv25 added a commit that referenced this pull request Oct 4, 2023
* Add inference request cancellation APIs (#249)

* Fix state transitions for re-running requests (#251)

* Add backend/server APIs

* Implement the cancellation APIs

* Only store the state in response factory

* Add unit testing for request cancellation

* Add test

* Add cancellation status

* Add testing for cancelling a request after release

* Handle request re-use

* Enable request reuse test

* Add staged changes

* Add temporary fix for the request state bug

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Fix request re-use when cancelling a request (#253)

* Add inference request cancellation APIs (#249)

* Fix state transitions for re-running requests (#251)

* Add backend/server APIs

* Implement the cancellation APIs

* Only store the state in response factory

* Add unit testing for request cancellation

* Add test

* Add cancellation status

* Add testing for cancelling a request after release

* Handle request re-use

* Enable request reuse test

* Add staged changes

* Add temporary fix for the request state bug

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Fix request re-use when cancelling a request

* Review edit

* Fix warmup request

* Fix null requests

* Review edit

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Dynamic batch scheduler request cancellation (#257)

* Add adapter for IsCancelled()

* Move cancel functions outside TRITON_ENABLE_STATS

* Add request cancellation to dynamic batch scheduler

* Refactor cancelled requests to use rejected requests routine

* Remove shared pointer wrapper for rejected and cancelled requests

* Ensemble scheduler request cancellation (#263)

* Add adapter for IsCancelled()

* Move cancel functions outside TRITON_ENABLE_STATS

* Add request cancellation to ensemble scheduler

* Sequence batch scheduler request cancellation (#260)

* Add request cancellation to sequence batch scheduler

* Add request cancellation to in-flight sequences

* Refactor on how a request is cancelled

* Fix issue when request is timeout dummy

* Always mark request cancelled before cancelling

* Mark immutable static status as const

---------

Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Jacky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants