Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main: return different exit code when an input is not found #2236

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Feb 26, 2024

This is a proof of concept that shows how we could do a different exit code in skopeo when an input is not found. The use case is osbuild which uses skopeo to inspect images and it would be nice to differenciate between an image that is not found and general skopeo errors (or errors like network issues etc).

Feel free to close if it's not desired, I can also open an issue if that is preferable, it just felt easier this way so that we have something more concrete to discuss :)

Note that this POC is not ideal:

  1. lacks tests (happy to do them if there is a chance this is considered)
  2. The output breaks, it used to start with FATA[000] and now starts with ERRO[0000]

It's likely that (2) is a show stopper, it seems there is no way to override the exit code of logrus for logrus.Fatal() so it would require a bit more work if the output must stay identical.

P.S. I also found that for local images that are not found ErrNotAnImage is returned so the docstring of storage.ErrNotAnImage may need updating (I'm happy to do this separtely if desired).

To reproduce:

$ skopeo inspect containers-storage:010101010101
FATA[0000] Error parsing image name "containers-storage:010101010101": reference "[overlay@/media/space/home/egon/.local/share/containers/storage+/run/user/1000/containers]docker.io/library/010101010101:latest" does not resolve to an image ID: identifier is not an image

For references (from storage/errors.go):

ErrNotAnImage = errors.New("identifier is not an image")
ErrImageUnknown = errors.New("image not known")

@mvo5 mvo5 changed the title main: return different exit code when an input is not found [RFC] main: return different exit code when an input is not found Feb 26, 2024
@cgwalters
Copy link
Contributor

This is very related to #1757

(with followups in #2114 etc.)

I suspect your use case (trying to do container inspect/fetching stuff from Python) may also benefit from using the proxy, which is effectively a language-agnostic (though ad-hoc) IPC mechanism to talk to containers/image. That said, if you use it you would have to implement writing to e.g. dir transports on your own.

But, one could racily use the proxy just for OpenImageOptional, and then just fork off the skopeo binary.

This all said, the exit code from the binary thing seems OK to me personally (it just doesn't scale very well).

The output breaks, it used to start with FATA[000] and now starts with ERRO[0000]

Does this matter?

@mtrmac
Copy link
Contributor

mtrmac commented Feb 26, 2024

  1. The output breaks, it used to start with FATA[000] and now starts with ERRO[0000]

I’m not worried about this, the specific text of errors is certainly not an ABI.

Really ERRO[0000] is inconvenient nonsense that we should eventually replace, it’s just never been a priority.


P.S. I also found that for local images that are not found ErrNotAnImage is returned so the docstring of storage.ErrNotAnImage may need updating (I'm happy to do this separtely if desired).

I suspect what you are seeing is c/image/storage.ErrNoSuchImage; the way that is defined is long-standing, probably a mistake, but also not something we can change without risk.


The thing I am worried about is the exact semantics of the exit code. The new exit code would be an ABI (and we “only” have ~127 / ~255 of them). Consider the discussion in #2114 about “archive does not exist” vs. “image within an archive does not exist”.

Would this new exit code be defined for all transports? If so, we would need some kind of ~universally-applicable definition; maybe directly in c/image, or at the very least share it with the proxy’s OpenImageOptional.

Alternatively, skopeo inspect --detect-storage-missing-image would be a horribly ugly but conceptually acceptable thing, where the caller explicitly opts into a specific behavior (and we aren’t going to run out of horribly ugly long option names). Ugh.


Maybe one way to handle this short-term would be to document the ambition for OpenImageOptional, and this exit code, to handle all cases, but to only add transports as need arises. Not at all elegant, but also not much worse than --detect-… and it would avoid the risk of us getting stuck on defining semantics for a transport we don’t immediately care about.

@mvo5
Copy link
Contributor Author

mvo5 commented Feb 26, 2024

Thanks for the quick and thoughtful feedback!

  1. The output breaks, it used to start with FATA[000] and now starts with ERRO[0000]
    I’m not worried about this, the specific text of errors is certainly not an ABI.
    [..]

Wonderful! Thank you! (FTR I agree but wanted to not oversell my change as transparent when it's not).

[..]

The thing I am worried about is the exact semantics of the exit code. The new exit code would be an ABI (and we “only” have ~127 / ~255 of them). Consider the discussion in #2114 about “archive does not exist” vs. “image within an archive does not exist”.

I'm not too worried about this (famous last words) - thinking about the common ones I can think of:

  • not found
  • network error
  • auth error
  • invalid options
  • generic error

And just to get an upper bound counting all error in storage types/errors.go I see 40 (and just to get some more data libpod libpod/define/errors.go is 61) - which of course we would never do.

Or am I overlooking something here?

Would this new exit code be defined for all transports? If so, we would need some kind of ~universally-applicable definition; maybe directly in c/image, or at the very least share it with the proxy’s OpenImageOptional.
[..]

I think it would be good to have a shared helper, I have not read the full backlog (yet) so I'm not familiar with the subtleties but my gut-feeling is that users can always use (real) programming languages and specific error checking when they need fine grained control and the exit codes would just be a (useful) "hint". But maybe I'm missing something, I will read the backlog.

I'm happy to work on this (consolidate with the pre-exiting work, add tests) but equally happy to just wait for the discussion. I can also look into the proxy suggestion from Colin (thanks!), I wasn't aware of this feature.

@mtrmac
Copy link
Contributor

mtrmac commented Feb 26, 2024

And just to get an upper bound counting all error in storage types/errors.go I see 40

It’s not just the count — it’s also the exclusivity. We can only return one exit status value.

At least in the past, there have been registries which respond to any request to an inaccessible repo with “no such image”, even when the image exists but the caller does not have privileges. Suppose we define an exit code for “image does not exist”, and for “insufficient access”; we can’t return both values.

Dealing with this in some kind of forward-compatible way seems fairly difficult; we need to, in advance, define the semantics in a specific enough, but also general enough, way. (Admittedly, many UNIX CLI programs do document specific exit codes, and this isn’t widely considered that much of a problem.)

I’m almost tempted to, instead, do skopeo inspect --json-on-error with

{
   "status": "failed",
   "storage-image-not-found": "true",
   "docker-credentials-rejected": "false",
   // …
}

(modeled after “conditions” in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties ).

Almost; that does look like massive overkill.

@cgwalters
Copy link
Contributor

At least in the past, there have been registries which respond to any request to an inaccessible repo with “no such image”, even when the image exists but the caller does not have privileges.

I think you have it backwards, it's common to respond with a (using HTTP status codes here for clarity) with a 403 if the caller doesn't have permission to see images in the namespace, even if it's really a 404.

But for the use cases the people desiring this want, that's understood and not a real problem.

@mtrmac
Copy link
Contributor

mtrmac commented Feb 26, 2024

I think you have it backwards

Yes, my mistake.

@mvo5
Copy link
Contributor Author

mvo5 commented Feb 26, 2024

Thanks for the detailed explanation! I agree with the general notion and the general issue and I am fine to not use status codes having a --jsonon-error is also fine for my use-case.

Having said all of this, the specific example about not-found vs not-permission-to-even-see-if-its-there (404 vs 403) seems to be map-able in a relatively straightforward way: no-permission would win here just like with http? But again, sorr,y I understand that there are surely more general cases where it's much more difficult to decide and returning two statues would be desirable. I will look at the proxy code tomorrow to see if that is an option for us. Thanks again for all your help and input!

@mtrmac
Copy link
Contributor

mtrmac commented Feb 26, 2024

IIRC there was a better example of the slippery nature of trying to categorize failure kinds sometime in the past, but I can’t quickly find it.

I do think that a designated exit code is a viable approach, it would primarily require careful thought in the documentation of the semantics.

@mvo5 mvo5 force-pushed the errcode-image-not-found branch from 2dfafe0 to 4d71e73 Compare March 22, 2024 14:32
@mvo5
Copy link
Contributor Author

mvo5 commented Mar 22, 2024

IIRC there was a better example of the slippery nature of trying to categorize failure kinds sometime in the past, but I can’t quickly find it.

I do think that a designated exit code is a viable approach, it would primarily require careful thought in the documentation of the semantics.

I just managed to find a bit of time to look at this again and it seems:

$ skopeo inspect docker://does-not-exist
FATA[0001] Error parsing image name "docker://does-not-exist": reading manifest latest in docker.io/library/does-not-exist: requested access to the resource is denied

is one such example.

I updated the commit a bit based on the discussion. There is no clear consensus so I did this best effort :)

Feel free to close if you disagree but I would love to have something like this!

@mvo5 mvo5 marked this pull request as ready for review March 22, 2024 14:39
@mvo5 mvo5 changed the title [RFC] main: return different exit code when an input is not found main: return different exit code when an input is not found Mar 22, 2024
@mvo5 mvo5 force-pushed the errcode-image-not-found branch from 4d71e73 to fc2887a Compare March 25, 2024 14:32
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@mvo5 mvo5 force-pushed the errcode-image-not-found branch 2 times, most recently from fa30e9d to 3c8bdea Compare March 25, 2024 15:32
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

All right, let’s do this.

cmd/skopeo/utils.go Outdated Show resolved Hide resolved
cmd/skopeo/main.go Outdated Show resolved Hide resolved
docs/skopeo.1.md Outdated Show resolved Hide resolved
docs/skopeo.1.md Outdated Show resolved Hide resolved
docs/skopeo.1.md Outdated
@@ -123,6 +123,13 @@ Print the version number
Default directory containing registry configuration, if **--registries.d** is not specified.
The contents of this directory are documented in [containers-registries.d(5)](https://github.com/containers/image/blob/main/docs/containers-registries.d.5.md).

## EXIT STATUS
`skopeo` exists with status 0 if the operation was successfully. Error result
in a non-zero exit code. If the input image cannot be found an exit status of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wonder about being more precise here about the “image does not exist” vs. “image container does not exist” distinction we ran into with oci*, e.g. containers/image#1675 .

I’m leaning towards not discussing it here, because the text is correct as is, and most users don’t care most of the time, but don’t feel strongly about it at all.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion on this (maybe someone else has?). I'm happy to work on this more but personally I think the (somewhat vague) description that is there now is okay.

@mvo5 mvo5 force-pushed the errcode-image-not-found branch from 3c8bdea to b829b42 Compare March 25, 2024 16:48
docs/skopeo.1.md Outdated Show resolved Hide resolved
This commit makes skopeo return a different exit code when an
input is not found. The use case is `osbuild` which uses skopeo
to inspect images and it would be nice to differenciate between
an image that is not found and general skopeo errors (or errors
like network issues etc).

I picked exit code `2` for `not found` because it is also the value
of `ENOENT`.

Man page and a test are added.

Signed-off-by: Michael Vogt <[email protected]>
@mvo5 mvo5 force-pushed the errcode-image-not-found branch from b829b42 to 16b6f0a Compare March 25, 2024 17:07
@mtrmac mtrmac merged commit d041172 into containers:main Mar 25, 2024
24 checks passed
@mtrmac
Copy link
Contributor

mtrmac commented Mar 25, 2024

Thanks again!

@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants