Skip to content
This repository has been archived by the owner on Aug 25, 2022. It is now read-only.

use different exit codes in argv parsing #179

Merged
merged 5 commits into from
Oct 14, 2019
Merged

Conversation

hannesm
Copy link
Member

@hannesm hannesm commented Oct 11, 2019

Functoria_runtime.with_argv is parsing the argument vector. This is used by all MirageOS unikernels. It has a three-valued return: success (returns unit), `Help | `Version - used to be 0), Error (exception, OCaml runtime returns 2).

As it looks now when we start hello with an argument it cannot handle (--fpoo=bar):

console data 2019-10-11T23:18:42-00:00: hello: unknown option `--fpoo'.
console data 2019-10-11T23:18:42-00:00: Usage: hello [OPTION]... 
console data 2019-10-11T23:18:42-00:00: Try `hello --help' for more information.
console data 2019-10-11T23:18:42-00:00: Fatal error: exception Failure("Key initialization failed")
console data 2019-10-11T23:18:42-00:00: Raised at file "stdlib.ml", line 29, characters 17-33
console data 2019-10-11T23:18:42-00:00: Called from file "main.ml", line 29, characters 9-95
console data 2019-10-11T23:18:42-00:00: Called from file "camlinternalLazy.ml", line 31, characters 17-27
console data 2019-10-11T23:18:42-00:00: Re-raised at file "camlinternalLazy.ml", line 36, characters 4-11
console data 2019-10-11T23:18:42-00:00: Solo5: solo5_exit(2) called

Now, I intended to implement a restart-on-failure feature in albatross. The only information we have at exit is the exit code, and this can be divided into "those which should trigger a restart" (i.e. out of memory in most unikernels, unhandled exception, ..) and "those which should not trigger a restart" (i.e. solo5 tender can't process spt/hvt image, manifest mismatch, boot argument parser error).

FWIW, I figured the following exit codes are used:

  • solo5 returns whatever exit code the guest passed to void exit(int status).
  • 0 normal termination
  • 1 solo5 internal failure (i.e. bad image / mismatch between manifest and command line arguments)
  • 2 unhandled OCaml exception (out of memory, ...) -- to be changed to abort() in 4.10.0, see Use abort instead of exit(2) in caml_fatal_error. ocaml/ocaml#8630
  • 255 solo5_abort

Now, while restart-on-failure can be configured by a user with a whitelist of exit code which should trigger a restart, there is need for a sensible default, which is everything besides 1 and the (arbitrary) range 64..70.

(this is a PR against 2.2 which I tested locally, if this is accepted I'll of course PR as well against master)

@hannesm
Copy link
Member Author

hannesm commented Oct 13, 2019

what I forgot to include in the comment above is the output with these changes in:

with --foop=bar

console data 2019-10-13T10:40:42-00:00: hello: unknown option `--foop'.
console data 2019-10-13T10:40:42-00:00: Usage: hello [OPTION]... 
console data 2019-10-13T10:40:42-00:00: Try `hello --help' for more information.
console data 2019-10-13T10:40:42-00:00: Solo5: solo5_exit(64) called

with --help

console data 2019-10-13T10:41:56-00:00: NAME
console data 2019-10-13T10:41:56-00:00:        hello
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: SYNOPSIS
console data 2019-10-13T10:41:56-00:00:        hello [OPTION]... 
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: UNIKERNEL PARAMETERS
console data 2019-10-13T10:41:56-00:00:        -l LEVEL, --logs=LEVEL (absent MIRAGE_LOGS env)
console data 2019-10-13T10:41:56-00:00:            Be more or less verbose. LEVEL must be of the form
console data 2019-10-13T10:41:56-00:00:            *:info,foo:debug means that that the log threshold is set to info
console data 2019-10-13T10:41:56-00:00:            for every log sources but the foo which is set to debug. 
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: OPTIONS
console data 2019-10-13T10:41:56-00:00:        --help[=FMT] (default=auto)
console data 2019-10-13T10:41:56-00:00:            Show this help in format FMT. The value FMT must be one of `auto',
console data 2019-10-13T10:41:56-00:00:            `pager', `groff' or `plain'. With `auto', the format is `pager` or
console data 2019-10-13T10:41:56-00:00:            `plain' whenever the TERM env var is `dumb' or undefined.
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: ENVIRONMENT
console data 2019-10-13T10:41:56-00:00:        These environment variables affect the execution of hello:
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00:        MIRAGE_LOGS
console data 2019-10-13T10:41:56-00:00:            See option --logs.
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: Solo5: solo5_exit(65) called

anyone (@mirage/core) has an opinion on this? I'd like to merge and release this monday (oct 14th) evening.

@avsm
Copy link
Member

avsm commented Oct 13, 2019

I'm in favour of this. Although it's a bigger change, it may be worth us standardising on sysexits.h for some of the returns, and not overlapping with them: https://www.freebsd.org/cgi/man.cgi?query=sysexits&apropos=0&sektion=0&manpath=FreeBSD+4.3-RELEASE&format=html

@hannesm
Copy link
Member Author

hannesm commented Oct 13, 2019

NB: yes, I'm aware of sysexits.
that's why I chose 64 / EX_USAGE for argument parsing error
I re-read sysexits man page, and would appreciate any insight which other error codes a MirageOS unikernel could exit with? EX_DATAERR, EX_UNAVAILABLE, EX_SOFTWARE (which is rather vague, and in our case likely 2 -- an unhandled exception), EX_IOERR (once someone does block device storage seriously), EX_PROTOCOL (though that's pretty vague as well), EX_CONFIG (which is pretty close to EX_DATAERR).

the help / version is debatable (it seems as well valid to have them return 0 -- i.e. successful program termination -- but imho it is more useful to exit with a specific code (out of my own use case -- restart-on-failure, and restarting a --help unikernel is not that satisfying ;), happy to use 63 for help&version, which won't collide with sysexits).

~> maybe we should have a small package "exit-codes" that define a mapping between constructor and numeric value? this could then serve as a global registry, to-be-used in the various libraries that should deal with errors?

@hannesm
Copy link
Member Author

hannesm commented Oct 13, 2019

I found http://www.tldp.org/LDP/abs/html/exitcodes.html which covers some more exit codes (all non-overlapping with either sysexits nor the ones proposed here), mainly for shell programming.

@mato
Copy link

mato commented Oct 14, 2019

FWIW, I figured the following exit codes are used:

solo5 returns whatever exit code the guest passed to void exit(int status).

If we're going to interpret exit codes, I should go through the Solo5 code and at least distinguish between:

  1. Start-up failure, e.g. target mismatch, bad command line, i.e. anything where the guest did not even start.
  2. Run-time failure, e.g. unhandled hypercall, VCPU exception for hvt, seccomp failure for spt i.e. cases where the guest did start but then failed while running.

Both of these cases currently return 1 for hvt. Case (2) returns 1 for hvt and 159 (returned by the shell due to process exiting with SIGSYS) for spt. There's nothing I can do about spt for (2), but at least for hvt, what codes would you like the tender to return? I'm guessing something high rather than low.

(related: Solo5/solo5#424)

@hannesm
Copy link
Member Author

hannesm commented Oct 14, 2019

@mato thanks for your explanation. it would be nice to document (and maybe adjust/unify) exit codes, but this is really not urgent.

…line

processor):
- on parse failure, 64
- if help or version was requested, 63

reasoning is to design reasonable restart-on-failure semantics.
require dune <= 1.5.1 for testing
@hannesm
Copy link
Member Author

hannesm commented Oct 14, 2019

the list from sysexits and bash tutorial is atm as a comment at https://github.com/hannesm/albatross/blob/solo5-6/src/vmm_core.ml#L333-L358, @mato yes, something high would be good!

CI is green (after using dune.1.5.3 as test-dependency), merging and releasing

@hannesm hannesm merged commit 45a8e12 into mirage:2.2 Oct 14, 2019
hannesm added a commit to hannesm/opam-repository that referenced this pull request Oct 14, 2019
CHANGES:

* Functoria_runtime.with_argv now uses (mirage/functoria#179, by @hannesm)
  - exit 63 when `Help or `Version is requested (used to exit with 0)
  - exit 64 when Term.eval returns with an error (used to raise an exception)
@hannesm hannesm deleted the command-exit branch October 14, 2019 19:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants