Xen project Mailing List

[Xen-devel] Design session report: Live-Updating Xen

To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: "Foerster, Leonard" <foersleo@xxxxxxxxxx>

Date: Mon, 15 Jul 2019 18:57:59 +0000

Accept-language: en-US

Delivery-date: Mon, 15 Jul 2019 18:58:15 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHVOz84rrbZg3jr7UWo9Co44cwhiw==

Thread-topic: Design session report: Live-Updating Xen

Here is the summary/notes from the Xen Live-Update Design session last week. I tried to tie together the different topics we talked about into some sections. https://cryptpad.fr/pad/#/2/pad/edit/fCwXg1GmSXXG8bc4ridHAsnR/ -- Leonard LIVE UPDATING XEN - DESING SESSION Brief project overview: -> We want to build Xen Live-update -> early prototyping phase IDEA: change running hypervisor to new one without guest disruptions -> Reasons: * Security - we might need an updated versions for vulnerability mitigation * Development cycle acceleration - fast switch to hypervisor during development * Maintainability - reduce version diversity in the fleet -> We are currently eyeing a combination of guest transparent live migration and kexec into a new xen build -> For more details: https://xensummit19.sched.com/event/PFVQ/live-updating-xen-amit-shah-david-woodhouse-amazon Terminology: Running Xen -> The xen running on the host before update (Source) Target Xen -> The xen we are updating *to* Design discussions: Live-update ties into multiple other projects currently done in the Xen-project: * Secret free Xen: reduce the footprint of guest relevant data in Xen -> less state we might have to handle in the live update case * dom0less: bootstrap domains without the involvement of dom0 -> this might come in handy to at least setup and continue dom0 on target xen -> If we have this this might also enable us to de-serialize the state for other guest-domains in xen and not have to wait for dom0 to do this We want to just keep domain and hardware state -> Xen is supposedly completely to be exchanged -> We have to keep around the IOMMU page tables and do not touch them -> this might also come in handy for some newer UEFI boot related issues? -> We might have to go and re-inject certain interrupts -> do we need to dis-aggregate xenheap and domheap here? -> We are currently trying to avoid this A key cornerstone for Live-update is guest transparent live migration -> This means we are using a well defined ABI for saving/restoring domain state -> We do only rely on domain state and no internal xen state -> The idea is to migrate the guest not from one machine to another (in space) but on the same machine from one hypervisor to another (in time) -> In addition we want to keep as much as possible in memory unchanged and feed this back to the target domain in order to save time -> This means we will need additional info on those memory areas and have to be super careful not to stomp over them while starting the target xen -> for live migration: domid is a problem in this case -> randomize and pray does not work on smaller fleets -> this is not a problem for live-update -> BUT: as a community we shoudl make this restriction go away Exchanging the Hypervisor using kexec -> We have patches on upstream kexec-tools merged that enable multiboot2 for Xen -> We can now load the target xen binary to the crashdump region to not stomp over any valuable date we might need later -> But using the crashdump region for this has drawbacks when it comes to debugging and we might want to think about this later -> What happens when live-update goes wrong? -> Option: Increase Crashdump region size and partition it or have a separate reserved live-update region to load the target xen into -> Separate region or partitioned region is not a priority for V1 but should be on the road map for future versions Who serializes and deserializes domain state? -> dom0: This should work fine, but who does this for dom0 itself? -> Xen: This will need some more work, but might covered mostly by the dom0less effort on the arm side -> this will need some work for x86, but Stefano does not consider this a lot of work -> This would mean: serialize domain state into multiboot module and set domains up after kexecing xen in the dom0less manner -> make multiboot module general enough so we can tag it as boot/resume/create/etc. -> this will also enable us to do per-guest feature enablement -> finer granular than specifying on cmdline -> cmdline stuff is mostly broken, needs to be fixed for nested either way -> domain create flags is a mess Live update instead of crashdump? -> Can we use such capabilities to recover from a crash be "restarting" xen on a crash? -> live updating into (the same) xen on crash -> crashing is a good mechanism because it happens if something is really broken and most likely not recoverable -> Live update should be a conscious process and not something you do as reaction to a crash -> something is really broken if we crash -> we should not proactively restart xen on crash -> we might run into crash loops -> maybe this can be done in the future, but it is not changing anything for the design -> if anybody wants to wire this up once live update is there, that should not be too hard -> then you want to think about: scattering the domains to multiple other hosts to not keep them on broken machines We should use this opportunity to clean up certain parts of the code base: -> interface for domain information is a mess -> HVM and PV have some shared data but completely different ways of accessing it Volume of patches: -> Live update: still developing, we do not know yet -> guest transparent live migration: -> We have roughly 100 patches over time -> we believe most of this has just to be cleaned up/squashed and will land us at a reasonable much lower number -> this also needs 2-3 dom0 kernel patches Summary of action items: -> coordinate with dom0less effort on what we can use and contribute there -> fix the domid clash problem -> Decision on usage of crash kernel area -> fix live migration patch set to include yet unsupported backends -> clean up the patch set -> upstream it Longer term vision: * Have a tiny hypervisor between Guest and Xen that handles the common cases -> this enables (almost) zero downtime for the guest -> the tiny hypervisor will maintain the guest while the underlying xen is kexecing into new build * Somebody someday will want to get rid of the long tail of old xen versions in a fleet -> live patch old running versions with live update capability? -> crashdumping into a new hypervisor? -> "crazy idea" but this will likely come up at some point

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.