Distinguish permanent API errors from transient ones #1640

hiddeco · 2024-03-15T23:18:20Z

We do at present not distinguish "not found" errors (permanent) from e.g. "the Kubernetes API server temporary can not be reached" (transient). Because of this, a Stage's verification process may fail prematurely while the controller could theoretically automatically recover it, if given the time.

As manually recovering from it is both cumbersome to a user, and potentially a waste of computing power used by the AnalysisRun. I think we can do a better job at distinguishing these type of errors, and prevent giving up on transient ones by e.g. requeueing and not erasing AnalysisRun references, etc.

xref: #1611 (comment)

Note: While I have only observed this to happen for a Stage's verification process, this may actually apply to more areas of Kargo.

krancour · 2024-08-19T16:41:40Z

I think we've made progress on this and there's more to be made still, but I think that, like #1479, this is an on-going effort that we can kick from release to release until we feel satisfied.

krancour · 2024-10-12T00:21:40Z

Deferring as this is something we're just tracking until satisfied.

krancour · 2024-12-12T17:04:10Z

Noting some incremental progress on this in #3119

hiddeco added kind/enhancement area/controller labels Mar 15, 2024

github-actions bot added the needs/priority label Mar 15, 2024

hiddeco self-assigned this Mar 18, 2024

hiddeco mentioned this issue Mar 19, 2024

feat(verification): improve transient error handling #1650

Merged

krancour added priority/normal and removed needs/priority labels Apr 2, 2024

krancour added this to the v0.6.0 milestone Apr 2, 2024

krancour removed this from the v0.6.0 milestone Apr 30, 2024

krancour added this to the v0.9.0 milestone Aug 19, 2024

krancour modified the milestones: v0.9.0, v0.10.0 Sep 30, 2024

krancour modified the milestones: v1.0.0, v1.1.0 Oct 12, 2024

krancour modified the milestones: v1.1.0, v1.2.0 Nov 15, 2024

krancour modified the milestones: v1.2.0, v1.3.0 Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish permanent API errors from transient ones #1640

Distinguish permanent API errors from transient ones #1640

hiddeco commented Mar 15, 2024

krancour commented Aug 19, 2024

krancour commented Oct 12, 2024

krancour commented Dec 12, 2024

Distinguish permanent API errors from transient ones #1640

Distinguish permanent API errors from transient ones #1640

Comments

hiddeco commented Mar 15, 2024

krancour commented Aug 19, 2024

krancour commented Oct 12, 2024

krancour commented Dec 12, 2024