-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
teach propolis-server to understand configurable boot order #756
Conversation
d0d140a
to
f5437f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iximeow I've taken a stab at answering your other questions here; let me know if I can clarify any of those answers or get you any more info about how these parts of Propolis are (meant to be) arranged.
Thanks for putting this together and for asking thoughtful questions about the existing design! There are definitely some pre-existing things we can go improve here.
this could have been made to work, but i'm walking this back as it would land fairly untested at first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhat closer to landing, almost all the changes here are to the new boot_order.rs
tests as i've been checking that guests behave in a useful way. i've one last test i want to write on that front: what does a guest see if we configure a boot order via EFI variable, stop the VM, configure a different boot order via fw_cfg, then restart a VM?
phd-tests/tests/src/boot_order.rs
Outdated
// The entry we booted from is clearly valid, so we should be able to insert a few duplicate | ||
// entries. We won't boot into them, but if something bogus happens and we did boot one of | ||
// these, at least it'll work and we can detect the misbehavior. | ||
// | ||
// But here's a weird one: if we just append these to the end, on reboot they'll be moved | ||
// somewhat up the boot order. This occurrs both if setting variables through `efibootmgr` or | ||
// by writing to /sys/firmware/efi/efivars/BootOrder-* directly. As an example, say we had a | ||
// boot order of "0004,0001,0003,0000" where boot options were as follows: | ||
// * 0000: UiApp | ||
// * 0001: PCI device 4, function 0 | ||
// * 0003: EFI shell (Firmware volume+file) | ||
// * 0004: Ubuntu (HD partition 15, GPT formatted) | ||
// | ||
// If we duplicate entry 4 to new options FFF0 and FFFF, reset the boot order to | ||
// "0004,0001,0003,0000,FFF0,FFFF", then reboot the VM, the boot order when it comes back | ||
// up will be "0004,0001,FFF0,FFFF,0003,0000". | ||
// | ||
// This almost makes sense, but with other devices in the mix I've seen reorderings like | ||
// `0004,0001,<PCI 16.0>,0003,0000,FFF0,FFFF` turning into | ||
// `0004,0001,FFF0,FFFF,<PCI 16.0>,0003,0000`. This is particularly strange in that the new | ||
// options were reordered around some other PCI device. It's not the boot order we set! | ||
// | ||
// So, to at least confirm we *can* modify the boot order in a stable way, make a somewhat less | ||
// ambitious change: insert the duplicate boot options in the order directly after the option | ||
// they are duplicates of. This seems to not get reordered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i cannot make sense of this, but it doesn't seem to be a bug resulting from anything we're doing. i'd describe the overall behavior as "kind of works, and probably not something to debug in Propolis, maybe something to better understand in OVMF"..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because of ovmf trying to reconcile the EFI boot order with what's in the qemu fwcfg - https://github.com/oxidecomputer/edk2/blob/edd72391086a9a297bf9aa011fe53b449165e556/OvmfPkg/Library/QemuBootOrderLib/QemuBootOrderLib.c#L1911
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, basically what Greg had called out on the RFD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah exactly. the specifically confusing thing about this, to me, is that the duplicate FFF0 and FFFF entries remain in the boot order at all. i would expect them to get dropped late in SetBootOrderFromQemu
, when it calls into BootOrderComplete
and gets here.
between this and fwcfg overriding guest-configured boot order generally, i'm leaning towards actual configurations Propolis receives only naming a single boot device. if any of this changes because of a change in OVMF at some point, i want the test to fail as an indicator that Something Changed, but it feels like we shouldn't be relying on specific semantics here
phd-tests/tests/src/boot_order.rs
Outdated
/// | ||
/// For Linux guests: variables automatically have their prior attributes prepended. Provide only | ||
/// the variable's data. | ||
async fn write_efivar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boring but delayed realization: writing to NvVars only actually matters if the guest OS has an EFI partition to persist to. Alpine virt images, for example, do not, so these changes are always lost on reboot. The "guest can reconfigure boot order" test will fail because of this, and rightfully so because the guest image simply does not support reconfiguring the boot order via EFI.
i think that's all basically fine, since we'd expect Alpine to basically be fine for other functionality, but now i'm curious what CI thinks of this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW there's a guest_os_has_read_only_fs
helper method on TestVm
that's meant to help with cases like this--see crucible::smoke::shutdown_persistence_test
for an example. (Of course, in that case you've just ended up not running the test, which makes CI pass but doesn't add much else of value.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'd need a related fs_is_read_readonly()
helper or something, since EFI variables are on a different filesystem, which would be fine.. but the Alpine image that doesn't seem to persist NvVars also has /sys/firmware/efi/efivars
mounted rw. i'm not confident enough to assert this yet, but i suspect somewhere in efivarfs
the actual persistence to disk is silently lost..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol: bcb6824#diff-3a20a9ed692f02614280c3438c2e7368597cb587487dcc28be5c6bac8affc549R321-R340
the Alpine ISO 9660-backed virt image, for reference, looks like this:
localhost:~# mount | grep efivarfs
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
note it even says rw
!! i think this is because technically the filesystem could be reasonably written to even without a writable ESP - it could be configuring real non-volatile storage. just, it's not here. it's unfortunate the write gets lost as a result, but it is what it is.
these still depend on having a guest artifact with an EFI partition to persist NvVars into, but with such a partition they actually show useful behavior!
functionally the guest adaptations for 22.04 and 24.04 are the same. these "24.04" definitions are actually in support of my weird 24.04 base image that has a user "root" instead of ubuntu-with-a-password
this makes the tests kind of legible..
don't *really* need the file present for any test purpose
fn get_shell_prompt(&self) -> &'static str { | ||
"root@ubuntu:~#" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was useful for my inconsistently-configured 24.04 test image, but probably not how any other 24.04 test image would be set up. or at least, it's different from the existing 22.04 personality (which does not expect to be root). so i'll drop this from the PR and make sure we've got something like "how to get or assemble test images for PHD" written up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 24.04 actually necessary for any particular reason? IIRC on CI we run these tests against an Alpine image, but if we need to start running certain tests against Ubuntu 24.04, presumably we can start doing that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not at this point, this is partially an artifact of at one point using efibootmgr
(which is not in the base Alpine image) for configuration.
i did notice that guest_can_adjust_boot_order
does not pass with the Alpine CI image, but does with my 24.04 image. i'm not exactly sure why yet, but i don't think it works out in the direction of "need 24.04 in CI".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure we've got something like "how to get or assemble test images for PHD" written up
I'd almost forgotten about it, but a ways back I created the guest-os-image-configs repo as a place to put stuff like this.
fn get_shell_prompt(&self) -> &'static str { | ||
"root@ubuntu:~#" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 24.04 actually necessary for any particular reason? IIRC on CI we run these tests against an Alpine image, but if we need to start running certain tests against Ubuntu 24.04, presumably we can start doing that as well?
// parsing here brought to you by rereading | ||
// * https://uefi.org/specs/UEFI/2.10/10_Protocols_Device_Path_Protocol.html | ||
// * https://uefi.org/specs/UEFI/2.10/03_Boot_Manager.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3 thanks for including the references :)
from bmat:
this is somewhat surprising. i expected edit: finally figured it out. didn't pull |
this drops the untested support from booting from NICs as well
this test does not rely on an EFI partition existing at all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic. Thanks for putting in all the work to make it happen (especially all the digging into the UEFI spec and EDK2)! I found a few small bolts to tighten before merging but overall this LGTM.
Co-authored-by: Greg Colombo <[email protected]>
@@ -57,6 +57,7 @@ pub(crate) struct Spec { | |||
pub board: Board, | |||
pub disks: HashMap<String, Disk>, | |||
pub nics: HashMap<String, Nic>, | |||
pub boot_order: Option<Vec<BootOrderEntry>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicky: maybe worth a comment explaining the semantic difference between None
and Some([])
?
self.boot_disk = DiskRequest { | ||
let boot_order = self.boot_order.get_or_insert(Vec::new()); | ||
if let Some(prev_boot_item) = | ||
boot_order.iter().position(|d| *d == "boot-disk") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super goofy, unimportant nitpick that nobody besides me will ever care about: the string "boot-disk"
shows up enough here that i might const-ify it...
// The first disk in the boot list might not be the disk a test *actually* expects to boot. | ||
// | ||
// If there are multiple bootable disks in the boot order, we'll assume they're all | ||
// the same guest OS kind. So look for `boot-disk` - if there isn't a disk named | ||
// `boot-disk` then fall back to hoping the first disk in the boot order is a bootable | ||
// disk, and if *that* isn't a bootable disk, maybe the first disk is. | ||
// | ||
// TODO: theoretically we might want to accept configuration of a specific guest OS adapter | ||
// and avoid the guessing games. So far the above supports existing tests and makes them | ||
// "Just Work", but a more complicated test may want more control here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the magic name "boot-disk"
and its behavior might be worth documenting in the API docs for...one of the methods in this module?
let boot_num_bytes = read_efivar(&vm, BOOT_CURRENT_VAR).await?; | ||
|
||
let boot_num: u16 = u16::from_le_bytes(boot_num_bytes.try_into().unwrap()); | ||
|
||
let boot_option_bytes = read_efivar(&vm, &bootvar(boot_num)).await?; | ||
|
||
let mut cursor = Cursor::new(boot_option_bytes.as_slice()); | ||
|
||
let load_option = EfiLoadOption::parse_from(&mut cursor).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if this ought to be a function, 'cause it also shows up twice in the configurable_boot_order
test?
propolis-server was correct with respect to the scheme PHD uses to name disks and their backends (user-provided name is the name of the backend, derived name is the name of the disk), but disk requests from `parse_disk_from_request` use the user-provided name as the device name, with the derived name as the backend name. this means that propolis-server is wrong when used with Nexus, and real boot disk names never match against any of the `<name>-backend` backends they were being compared against. this probably makes the whole thing work though! yay!
* a few typos * a few 80col violations * actually enforce that boot devices are on PCI bus 0 * emphasize that propolis-server should never see a Spec with invalid boot disk * vestigial line * phd_skip!() instead of warn!+return Ok
these newtypes are useful for other kinds of devices, and in propolis-api-types generally, but threading that through is more time than i can spend on this right now
59b6bfc
to
3a3e753
Compare
you love to see it 🎉 |
) the Omicron side of adding explicit boot order selection to instances (counterpart to [propolis#756](oxidecomputer/propolis#756)). first, this extends `params::InstanceCreate` to take a new `boot_disk: Option<params::InstanceDiskAttachment>`. additionally, this adds a `PUT /v1/instances/{instance}` to update instances. the only property that can be updated at the moment is `boot_disk`, pick a new boot disk or unset it entirely. this also partially subsumes #6321. finally, this updates Omicron to reference a recent enough Propolis that #756 is included. a surprising discovery along the way: you can specify a disk to be attached multiple times in `disks` today, when creating an instance, and we're fine with it! this carries through with the new `boot_disk` field: if you specify the disk as `boot_disk` and in `disks`, it is technically listing the disk for attachment twice but this will succeed.
brings the existing fw_cfg-based boot order configuration that we already have in
propolis-standalone
over topropolis-server
as well.pairs with oxidecomputer/omicron#6585 to actually configure instances' boot settings and get them to Propolis.
while we accept a list of boot options here, note that Nexus intentionally has a more reductive view, maintaining only a single
boot_disk
. this is heavily based on what i've learned writing tests for how OVMF+guests behave with different mixes of boot device orders and sources of boot configuration. this conversation on RFD 470 gets at the crux of it - we don't have like a platform NVRAM device for guests to persist EFI variables in, so boot configuration is persisted in the EFI system partition (so, associated with a disk, not an instance). because we're not going to go modify the EFI system partition in user disks to effect boot order preferences, we're using the QEMU-stylefw_cfg
mechanism to indicate boot order. this, in turn, means that guest settings are blown away and replaced with what OVMF determines at boot time.taken together, this isn't an ideal spot to be in if we were to support more complex boot behavior logic, and it's probably not how we want to work multi-device orderings into
propolis-server
. it also means that guest-managed boot ordering just don't reliably persist if the instance is configured with a specific boot device!in the future with managed nonvolatile storage for guest firmware to persist settings in, we'll be in a somewhat better position to extend boot logic, and so this PR should leave us able to do so without much API hassle. that will probably be more interesting on the control plane side, which is why i'm permissive on the Propolis side, but restrictive on the Nexus side.