coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS

Home Page:https://fedoraproject.org/coreos/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

metal: Support redundant bootable disks

cgwalters opened this issue · comments

We're about to gain support for many types of complex storage for the root filesystem, but this is distinct from the "root disk" which includes /boot and the ESP for example.

We had a subthread around this starting around here: #94 (comment)

Basically my proposal is that we support "equivalent RAID 1" by teaching coreos-installer how to replicate the /boot partition and ESP to multiple block devices. Making the root filesystem RAID is an orthogonal thing - for example, one might choose to use a higher RAID level for the OS and data, or one could use Stratis/btrfs/LVM for /. Another important variation here is to use LUKS-on-RAID (or equivalent) for the root device. To restate, the root filesystem can be distinct from how /boot and the ESP works.

I think it would work in this scenario to make /boot be MD-RAID - the bootloaders support that and it would mean having ostree update kernels would be fully transparent.
We can't make the ESP be RAID - that would need to be manually sync'd.

Basically my proposal is that we support "equivalent RAID 1" by teaching coreos-installer how to replicate the /boot partition and ESP to multiple block devices.

The initramfs is already handling copyout/copyin for the root filesystem; why not for /boot and the ESP? That way this functionality wouldn't be limited to the coreos-installer flow, and would hew closer to the "do everything via Ignition" principle.

I'm OK with implementing the work in the initramfs; though I think this case is basically never relevant for cloud platforms.

Are you also arguing (implicitly) that the user interface to this is via Ignition? I guess I'd thought of this as coreos-installer --redundant /dev/vdb /dev/vda or so. What would the Ignition look like if so? I don't think we can allow/support admins to do arbitrary things for /boot in the same way as /root. Or would be this have some high level sugar in fcct that compiles down into the Ignition to support setting up our main supported case of RAID 1 for /boot and then...hmm, generate a second ESP and a systemd unit to copy it?

I'm OK with implementing the work in the initramfs; though I think this case is basically never relevant for cloud platforms.

It may be relevant for image-based bare metal environments, like Packet or Ironic.

Are you also arguing (implicitly) that the user interface to this is via Ignition?

Sure. Ignition-disks has always been a bit tricky to write configs for; it'll easily let you create disk layouts that won't boot. Writing a config requires knowing which operations are supported (erasing or moving the root filesystem, soon) and which ones aren't (moving the ESP, currently). This use case is just another example of that. To address it, we have happy-path documentation with examples, and potentially we also have FCCT.

The Ignition config would look like: create an ESP, BIOS boot partition, and /boot partition on a second disk, and an MD-RAID1 on /boot. We can recognize the first two by their type GUIDs, and for /boot, we could try using the systemd Extended Boot Loader Partition GUID or we could define our own GUID. We'd then handle the copy operation in initramfs glue, same as we do for root. That code can be special-cased to support only the things we want to support, since everything else will just fail and the user will fix their config. FCCT sugar makes sense here, since otherwise we're asking users to paste type GUIDs and partition sizes from docs into their Ignition config.

As to the on-disk layout: /boot on MD-RAID1 should work even without bootloader support. We can just use a 1.0 RAID superblock (which goes at the end of the partition). That also requires the bootloader never to write to the disk (so no grubenv), and currently also requires using the raid.options escape hatch in the Ignition spec. Bootloader support would be better, of course. The same trick might work for the ESP but that's problematic, since the firmware might decide to write to that filesystem. (For example, when applying a firmware update capsule.)

It may be relevant for image-based bare metal environments, like Packet or Ironic.

Hmm right. Though even though coreos-installer is "glorified dd" a lot of the stuff we've built up there is really useful (like auto-detecting 4kn disks, validating signatures etc.). Replicating even that small stuff in Packet/Ironic carries a nontrivial cost.

I've been (implicitly) arguing that Ironic should basically learn to delegate to coreos-installer and not replicate it. But, that also carries a cost because Ironic obviously wants to support installing non-CoreOS systems too.

Anyways, I'm fine with doing it in the initramfs.

I was thinking recently that https://github.com/coreos/bootupd/ could own the post-install synchronization aspect of this. I think we'd need to define some equivalent of a "RAID config" that'd be like a JSON file we write to each copy of the ESP that would contain a uuid for itself plus the other members of the set, then bootupd takes care of mounting and synchronizing later updates.

The code landed in coreos/fedora-coreos-config#718 and the sugar in coreos/butane#162. Closing this out,

The fix for this went into testing stream release 33.20201214.2.0. Please try out the new release and report issues.

We're planning to make some changes to the RAID functionality in coreos/fedora-coreos-config#794, coreos/butane#178, and coreos/coreos-assembler#1979.

Machines that were configured with a mirrored boot disk on 33.20201214.2.0 should continue to function on upgrade, but will work differently from mirrored boot disks deployed on newer releases. Ignition configs that specify boot device mirroring and are built with fcct 0.8.0 will not be compatible with future OS releases. Folks should feel free to test out the new release, but be aware that changes are coming.

@dustymabe, I'm going to reset this issue's labels accordingly.

@dustymabe, I'm going to reset this issue's labels accordingly.

Sounds good. Thanks for the context!

The fix for this went into testing stream release 33.20210104.2.0. Please try out the new release and report issues.