[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Design session report: Live-Updating Xen
On 15/07/2019 19:57, Foerster, Leonard wrote: > Here is the summary/notes from the Xen Live-Update Design session last week. > I tried to tie together the different topics we talked about into some > sections. > > https://cryptpad.fr/pad/#/2/pad/edit/fCwXg1GmSXXG8bc4ridHAsnR/ > > -- > Leonard > > LIVE UPDATING XEN - DESING SESSION > > Brief project overview: > -> We want to build Xen Live-update > -> early prototyping phase > IDEA: change running hypervisor to new one without guest disruptions > -> Reasons: > * Security - we might need an updated versions for > vulnerability mitigation I know I'm going to regret saying this, but livepatches are probably a better bet in most cases for targeted security fixes. > * Development cycle acceleration - fast switch to hypervisor > during development > * Maintainability - reduce version diversity in the fleet :) I don't expect you to admit anything concrete on xen-devel, but I do hope the divergence it at least a little better under control than last time I got given an answer to this question. > -> We are currently eyeing a combination of guest transparent live > migration > and kexec into a new xen build > -> For more details: > https://xensummit19.sched.com/event/PFVQ/live-updating-xen-amit-shah-david-woodhouse-amazon > > Terminology: > Running Xen -> The xen running on the host before update (Source) > Target Xen -> The xen we are updating *to* > > Design discussions: > > Live-update ties into multiple other projects currently done in the > Xen-project: > > * Secret free Xen: reduce the footprint of guest relevant data in Xen > -> less state we might have to handle in the live update case I don't immediately see how this is related. Secret-free Xen is to do with having fewer things mapped by default. It doesn't fundamentally change the data that Xen needs to hold about guests, nor how this gets arranged in memory. > * dom0less: bootstrap domains without the involvement of dom0 > -> this might come in handy to at least setup and continue dom0 > on target xen > -> If we have this this might also enable us to de-serialize > the state for > other guest-domains in xen and not have to wait for > dom0 to do this Reconstruction of dom0 is something which Xen will definitely need to do. With the memory still in place, its just a fairly small of register state which needs restoring. That said, reconstruction of the typerefs will be an issue. Walking over a fully populated L4 tree can (in theory) take minutes, and its not safe to just start executing without reconstruction. Depending on how bad it is in practice, one option might be to do a demand validate of %rip and %rsp, along with a hybrid shadow mode which turns faults into typerefs, which would allow the gross cost of revalidation to be amortised while the vcpus were executing. We would definitely want some kind of logic to aggressively typeref outstanding pagetables so the shadow mode could be turned off. > We want to just keep domain and hardware state > -> Xen is supposedly completely to be exchanged > -> We have to keep around the IOMMU page tables and do not touch them > -> this might also come in handy for some newer UEFI boot > related issues? This is for Pre-DXE DMA protection, which IIRC is part of the UEFI 2.7 spec. It basically means that the IOMMU is set up and inhibiting DMA before any firmware starts using RAM. In both cases, it involves Xen's IOMMU driver being capable of initialising with the IOMMU already active, and in a way which keeps DMA and interrupt remapping safe. This is a chunk of work which should probably be split out into an independent prerequisite. > -> We might have to go and re-inject certain interrupts What hardware are you targeting here? IvyBridge and later has a posted interrupt descriptor which can accumulate pending interrupts (at least manually), and newer versions (Broadwell?) can accumulate interrupts directly from hardware. > -> do we need to dis-aggregate xenheap and domheap here? > -> We are currently trying to avoid this I don't think this will be necessary, or indeed a useful thing to try considering. There should be an absolute minimal amount of dependency between the two versions of Xen, to allow for the maximum flexibility in upgradeable scenarios. > > A key cornerstone for Live-update is guest transparent live migration > -> This means we are using a well defined ABI for saving/restoring > domain state > -> We do only rely on domain state and no internal xen state Absolutely. One issue I discussed with David a while ago is that even across an upgrade of Xen, the format of the EPT/NPT pagetables might change, at least in terms of the layout of software bits. (Especially for EPT where we slowly lose software bits to new hardware features we wish to use.) > -> The idea is to migrate the guest not from one machine to another (in > space) > but on the same machine from one hypervisor to another (in time) > -> In addition we want to keep as much as possible in memory unchanged > and feed > this back to the target domain in order to save time > -> This means we will need additional info on those memory areas and > have to > be super careful not to stomp over them while starting the > target xen > -> for live migration: domid is a problem in this case > -> randomize and pray does not work on smaller fleets > -> this is not a problem for live-update > -> BUT: as a community we shoudl make this restriction go away > > Exchanging the Hypervisor using kexec > -> We have patches on upstream kexec-tools merged that enable > multiboot2 for Xen > -> We can now load the target xen binary to the crashdump region to not > stomp > over any valuable date we might need later > -> But using the crashdump region for this has drawbacks when it comes > to debugging > and we might want to think about this later > -> What happens when live-update goes wrong? > -> Option: Increase Crashdump region size and partition it or > have a separate > reserved live-update region to load the target xen into > -> Separate region or partitioned region is not a priority for > V1 but should > be on the road map for future versions In terms of things needing physical contiguity, there is the Xen image itself (a few MB), various driver datastructures (the IOMMU interrupt remapping tables in particular, but I think we can probably scale the size by the number of vectors behind them in practice, rather than always making an order 7(or 8?) allocation to cover all 64k possible handles.) I think some of the directmap setup also expects to be able to find free 2M superpages. > > Who serializes and deserializes domain state? > -> dom0: This should work fine, but who does this for dom0 itself? > -> Xen: This will need some more work, but might covered mostly by the > dom0less effort on the arm side > -> this will need some work for x86, but Stefano does not > consider this a lot of work > -> This would mean: serialize domain state into multiboot module and > set domains > up after kexecing xen in the dom0less manner > -> make multiboot module general enough so we can tag it as > boot/resume/create/etc. > -> this will also enable us to do per-guest feature > enablement What is the intent here? > -> finer granular than specifying on cmdline > -> cmdline stuff is mostly broken, needs to be fixed > for nested either way > -> domain create flags is a mess There is going to have to be some kind of translation from old state to new settings. In the past, lots of Xen was based on global settings, an this is slowly being fixed into concrete per-domain settings. > > Live update instead of crashdump? > -> Can we use such capabilities to recover from a crash be "restarting" > xen on a crash? > -> live updating into (the same) xen on crash > -> crashing is a good mechanism because it happens if something is > really broken and > most likely not recoverable > -> Live update should be a conscious process and not something you do > as reaction to a crash > -> something is really broken if we crash > -> we should not proactively restart xen on crash > -> we might run into crash loops > -> maybe this can be done in the future, but it is not changing > anything for the design > -> if anybody wants to wire this up once live update is there, > that should not be too hard > -> then you want to think about: scattering the domains to > multiple other hosts to not keep > them on broken machines > > We should use this opportunity to clean up certain parts of the code base: > -> interface for domain information is a mess > -> HVM and PV have some shared data but completely different > ways of accessing it > > Volume of patches: > -> Live update: still developing, we do not know yet > -> guest transparent live migration: > -> We have roughly 100 patches over time > -> we believe most of this has just to be cleaned up/squashed > and > will land us at a reasonable much lower number > -> this also needs 2-3 dom0 kernel patches > > Summary of action items: > -> coordinate with dom0less effort on what we can use and contribute > there > -> fix the domid clash problem > -> Decision on usage of crash kernel area > -> fix live migration patch set to include yet unsupported backends > -> clean up the patch set > -> upstream it > > Longer term vision: > > * Have a tiny hypervisor between Guest and Xen that handles the common cases > -> this enables (almost) zero downtime for the guest > -> the tiny hypervisor will maintain the guest while the underlying xen > is kexecing into new build > > * Somebody someday will want to get rid of the long tail of old xen versions > in a fleet > -> live patch old running versions with live update capability? > -> crashdumping into a new hypervisor? > -> "crazy idea" but this will likely come up at some point How much do you need to patch an old Xen to have kexec take over cleanly? Almost all of the complexity is on the destination side AFAICT, which is good from a development point of view. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |