Log Errors for Failed Requests #7297

kachawla · 2024-03-09T03:12:57Z

Problem

We currently lack error logging upon failure for many operations which hinders our ability to debug failures. This gap exists because errors that aren't propagated back up (async operations) aren't logged automatically. The current state of the code is inconsistent, with some operations logging errors while others do not.

Here is an example of where error was not logged: #7264

Proposed Solution

To address this, the proposed change is to move error logging to the setFailed method. By doing so, we ensure that error logging is not reliant on every individual operation to log errors.

AB#11425

radius-triage-bot · 2024-03-09T03:13:09Z

👋 @kachawla Thanks for filing this bug report.

A project maintainer will review this report and get back to you soon. If you'd like immediate help troubleshooting, please visit our Discord server.

For more information on our triage process please visit our triage overview

nithyatsu · 2024-03-11T16:05:24Z

@kachawla , is the issue easy to reproduce? if so, could you please share the steps with me, I will look into this. I went through the PR's approach of propagating the error. I am trying to understand why the logs get dropped.

kachawla · 2024-03-11T18:29:42Z

@kachawla , is the issue easy to reproduce? if so, could you please share the steps with me, I will look into this. I went through the PR's approach of propagating the error. I am trying to understand why the logs get dropped.

@nithyatsu thanks for looking into this. Here is an example of where it happened: #7264. So if we run into an error on the delete resource path for portable resources, we will run into this. This gap exists on core rp resource creation/deletion as well should also be reproducible by injecting error on any of those paths as well. It is basically similar issue as what you fixed for create/update requests for portable resources here: #6276

radius-triage-bot · 2024-03-11T19:42:23Z

👍 We've reviewed this issue and have agreed to add it to our backlog. Please subscribe to this issue for notifications, we'll provide updates when we pick it up.

We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue.

For more information on our triage process please visit our triage overview

youngbupark · 2024-03-18T21:30:38Z

This is the error log from functional tests. This is coming from k8s API server and sdk. Unless we can get all logs from k8s related pods, it is hard to figure it out why. And it is also difficult to repro. I am sure that Radius was working as expected, but K8s API server was not working as expected. If we could see the similar issue in long-running test, we would be able to get more details from log analytics because long-running test logs all k8s related pods.

2024/03/05 04:16:27 Start streaming Kubernetes logs - Pod dapr-frontend-5448d95787-8dhhg is in state: Failed
    cli.go:418: [rad] Error: {
    cli.go:418: [rad]   "code": "Internal",
    cli.go:418: [rad]   "message": "could not find API version for type \"core/Secret\": unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request"

kachawla · 2024-03-18T22:15:30Z

This is the error log from functional tests. This is coming from k8s API server and sdk. Unless we can get all logs from k8s related pods, it is hard to figure it out why. And it is also difficult to repro. I am sure that Radius was working as expected, but K8s API server was not working as expected. If we could see the similar issue in long-running test, we would be able to get more details from log analytics because long-running test logs all k8s related pods.
2024/03/05 04:16:27 Start streaming Kubernetes logs - Pod dapr-frontend-5448d95787-8dhhg is in state: Failed
    cli.go:418: [rad] Error: {
    cli.go:418: [rad]   "code": "Internal",
    cli.go:418: [rad]   "message": "could not find API version for type \"core/Secret\": unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request"

I had seen this log already :), had added it here since logging done by functional tests was available #7264 (comment). But if you search through the RP logs (attached in this comment #7264 (comment)), this error isn't logged, which is a gap that I have seen in other places as well and am trying to figure out. I'm sure we can reproduce the logging issue by injecting any client side error, and confirm if logging is really being skipped. I agree that the particular error we saw in the functional test failure isn't easily reproducible.

kachawla · 2024-03-27T22:20:32Z

Fixed this in #7370 and created #7369 with suggestions on further improvements in the async worker.

kachawla added the bug Something is broken or not working as expected label Mar 9, 2024

kachawla mentioned this issue Mar 9, 2024

Log errors for failed requests #7296

Closed

sylvainsf added the triaged This issue has been reviewed and triaged label Mar 11, 2024

kachawla self-assigned this Mar 11, 2024

kachawla closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log Errors for Failed Requests #7297

Log Errors for Failed Requests #7297

kachawla commented Mar 9, 2024 •

edited

Loading

radius-triage-bot bot commented Mar 9, 2024

nithyatsu commented Mar 11, 2024 •

edited

Loading

kachawla commented Mar 11, 2024

radius-triage-bot bot commented Mar 11, 2024

youngbupark commented Mar 18, 2024

kachawla commented Mar 18, 2024 •

edited

Loading

kachawla commented Mar 27, 2024 •

edited

Loading

Log Errors for Failed Requests #7297

Log Errors for Failed Requests #7297

Comments

kachawla commented Mar 9, 2024 • edited Loading

Problem

Proposed Solution

radius-triage-bot bot commented Mar 9, 2024

nithyatsu commented Mar 11, 2024 • edited Loading

kachawla commented Mar 11, 2024

radius-triage-bot bot commented Mar 11, 2024

youngbupark commented Mar 18, 2024

kachawla commented Mar 18, 2024 • edited Loading

kachawla commented Mar 27, 2024 • edited Loading

kachawla commented Mar 9, 2024 •

edited

Loading

nithyatsu commented Mar 11, 2024 •

edited

Loading

kachawla commented Mar 18, 2024 •

edited

Loading

kachawla commented Mar 27, 2024 •

edited

Loading