Progress Info in Asyncworker #34

vinayada1 · 2023-12-08T19:26:47Z

Reporting operation progress in AsyncWorker

github-actions · 2023-12-08T19:27:15Z

❌ Spellcheck Failed

There are spelling errors in your PR. Visit the workflow output to see what words are failing.

Adding new words

You can add new custom words to .github/config/en-custom.txt.

rynowak · 2023-12-12T01:26:48Z

architecture/2023-12-asyncworker-progressInfo.md

+
+Currently async worker has no knowledge of the operation progress till the operation is complete. If the operation times out, there is no mechanism in the current design that any information about the operation progress so far can be communicated to the user. As a result, the user only sees the error message that the operation timed out.
+
+The proposal here is to add a mechanism so that the running operation can send out progress information to the async worker. As a result, when the deployment operation eventually times out, along with the timed out message, the user will also be able to see the operation progress so far in the error message which could help in troubleshooting the deployment failure.


I think this idea is pretty cool overall.

I'm curious what happened to the original idea? https://github.com/radius-project/design-notes/pull/22/files

I was trying to implement one of those in that design doc - detect readiness failures by looking at pod events and looks like we need this to be implemented to actually communicate the events to the user. Without this, even though the code can detect the failures, the user still sees - operation timed out.

Without this, even though the code can detect the failures, the user still sees - operation timed out.

Why is that the case? If the code inside the container operation implemented the timeout, it can control how the error reporting functions.

youngbupark · 2023-12-18T23:18:27Z

architecture/2023-12-asyncworker-progressInfo.md

+  // ProgressEvents represents the progress of the async operation.
+  // The events are represented as a map of key-value pairs where the key is 
+  // the event type such as Info, Error and the value is the event message.
+  ProgressEvents map[string]string `json:"progressEvents,omitempty"`


How does client gets this ProgressEvents? What's the protocol to get this error as client perspective? In ARM, operationstatuses api is the interface to get the status of async operation.

Therefore, I do not think we need to add new property for process event. The existing errors.details array property is enough to capture all in-progress error event. See this error format

// Error represents the error occurred during provisioning. Error *ErrorDetails `json:"error,omitempty"`

In terms of implementation, we need to pass additional error channel to async worker controller to allow the controller to update the live error. then consumer side can keep adding the error into error.Details array

Does that work in the case of success? My concern is that clients will be confused by the presence of the error field for a success case.

Ah make sense. Then, we can consider using operations array property. What do you think ?

Where are we going to get the ProgressEvents from? Do we need a formatter for this so that the end user can see it properly?

I wasn't aware there was an existing API for this. If there is an existing API, we should do another iteration with the proposed usage.

rynowak · 2023-12-19T19:10:56Z

architecture/2023-12-asyncworker-progressInfo.md

+
+## Overview
+
+Currently async worker has no knowledge of the operation progress till the operation is complete. If the operation times out, there is no mechanism in the current design that any information about the operation progress so far can be communicated to the user. As a result, the user only sees the error message that the operation timed out.


There's the ability to indicate the progress as a percentage, but I agree no real way to provide user-facing messages.

I think the ability to indicate progress is separate from the question of: "What error message do users see when an operation fails?"

This sounds like a useful feature, but we should have the debate about whether to use it for this.

rynowak · 2023-12-19T19:13:18Z

architecture/2023-12-asyncworker-progressInfo.md

+- Enable reporting of operation progress in async worker
+
+### Non-Goals
+- Modify reporting of operation progress in CLI


I think we can say that this is out of scope of this design doc, but we haven't delivered any value to users until you can see progress in the CLI or other UX.

rynowak · 2023-12-19T19:14:27Z

architecture/2023-12-asyncworker-progressInfo.md

+
+The readiness probe for the application pod fails. This is a non-terminal failure condition. The current code keeps waiting for the deployment to complete and eventually the deployment times out. The error message to the user is "Deployment timed out after xxx seconds".
+
+If we implement https://github.com/radius-project/radius/issues/6284 and look at pod events, we can detect the readiness probe failure. However, we still need a way to report this information to the user.


Do you have an example of what this would look like? eg: if we deployed a container what set of progress messages would we create?

Would this become the default behavior for all of our async operations? eg: when an operation times out we'd build an error message from the progress updates?

rynowak · 2023-12-19T19:16:31Z

architecture/2023-12-asyncworker-progressInfo.md

+When the operation times out, with this change, the user should now be able to see an error message similar to:-
+"Operation (APPLICATIONS.CORE/CONTAINERS|PUT) has timed out because it was processing longer than xx s. Progress events: 
+Info - Container ctnr-bad-readiness is running but not ready
+Error - Container failed readiness probe. Reason: Unhealthy, Message: Readiness probe failed: Get \"http://10.244.0.10:5000/bad\": dial tcp 10.244.0.10:5000: connect: connection refused"


Does order matter?

The way that this is defined there's a single message allowed for each severity level. That's not straightforward to understand. A design that's more like a log (sequence of events) is what I would expect.

rynowak · 2023-12-19T19:16:45Z

architecture/2023-12-asyncworker-progressInfo.md

+
+### API design (if applicable)
+
+NA


Definitely applicable :) this whole item is API design.

github-actions · 2024-05-06T21:13:35Z

This pull request is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

rynowak · 2024-05-06T21:36:12Z

Closing this based on staleness. Please reopen if this project gets started again.

Progress Info in Asyncworker

439e5c2

vinayada1 requested review from a team as code owners December 8, 2023 19:26

rynowak reviewed Dec 12, 2023

View reviewed changes

youngbupark reviewed Dec 18, 2023

View reviewed changes

rynowak reviewed Dec 19, 2023

View reviewed changes

github-actions bot added the Stale label May 6, 2024

rynowak closed this May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress Info in Asyncworker #34

Progress Info in Asyncworker #34

vinayada1 commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

rynowak Dec 12, 2023

vinayada1 Dec 18, 2023

rynowak Dec 18, 2023

youngbupark Dec 18, 2023

rynowak Dec 18, 2023

youngbupark Dec 18, 2023 •

edited

Loading

ytimocin Dec 19, 2023

rynowak Dec 19, 2023

rynowak Dec 19, 2023 •

edited

Loading

rynowak Dec 19, 2023

rynowak Dec 19, 2023

rynowak Dec 19, 2023

rynowak Dec 19, 2023

github-actions bot commented May 6, 2024

rynowak commented May 6, 2024


		Currently async worker has no knowledge of the operation progress till the operation is complete. If the operation times out, there is no mechanism in the current design that any information about the operation progress so far can be communicated to the user. As a result, the user only sees the error message that the operation timed out.

		The proposal here is to add a mechanism so that the running operation can send out progress information to the async worker. As a result, when the deployment operation eventually times out, along with the timed out message, the user will also be able to see the operation progress so far in the error message which could help in troubleshooting the deployment failure.


		## Overview

		Currently async worker has no knowledge of the operation progress till the operation is complete. If the operation times out, there is no mechanism in the current design that any information about the operation progress so far can be communicated to the user. As a result, the user only sees the error message that the operation timed out.


		The readiness probe for the application pod fails. This is a non-terminal failure condition. The current code keeps waiting for the deployment to complete and eventually the deployment times out. The error message to the user is "Deployment timed out after xxx seconds".

		If we implement https://github.com/radius-project/radius/issues/6284 and look at pod events, we can detect the readiness probe failure. However, we still need a way to report this information to the user.

Progress Info in Asyncworker #34

Progress Info in Asyncworker #34

Conversation

vinayada1 commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

❌ Spellcheck Failed

Adding new words

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngbupark Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rynowak Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 6, 2024

rynowak commented May 6, 2024

youngbupark Dec 18, 2023 •

edited

Loading

rynowak Dec 19, 2023 •

edited

Loading