[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Design session report: Live-Updating Xen


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Foerster, Leonard" <foersleo@xxxxxxxxxx>
  • Date: Mon, 15 Jul 2019 18:57:59 +0000
  • Accept-language: en-US
  • Delivery-date: Mon, 15 Jul 2019 18:58:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHVOz84rrbZg3jr7UWo9Co44cwhiw==
  • Thread-topic: Design session report: Live-Updating Xen

Here is the summary/notes from the Xen Live-Update Design session last week.
I tried to tie together the different topics we talked about into some sections.

https://cryptpad.fr/pad/#/2/pad/edit/fCwXg1GmSXXG8bc4ridHAsnR/

--
Leonard

LIVE UPDATING XEN - DESING SESSION

Brief project overview:
        -> We want to build Xen Live-update
        -> early prototyping phase
        IDEA: change running hypervisor to new one without guest disruptions
        -> Reasons:
                * Security - we might need an updated versions for 
vulnerability mitigation
                * Development cycle acceleration - fast switch to hypervisor 
during development
                * Maintainability - reduce version diversity in the fleet
        -> We are currently eyeing a combination of guest transparent live 
migration
                and kexec into a new xen build
        -> For more details: 
https://xensummit19.sched.com/event/PFVQ/live-updating-xen-amit-shah-david-woodhouse-amazon

Terminology:
        Running Xen -> The xen running on the host before update (Source)
        Target Xen -> The xen we are updating *to*

Design discussions:

Live-update ties into multiple other projects currently done in the Xen-project:

        * Secret free Xen: reduce the footprint of guest relevant data in Xen
                -> less state we might have to handle in the live update case
        * dom0less: bootstrap domains without the involvement of dom0
                -> this might come in handy to at least setup and continue dom0 
on target xen
                -> If we have this this might also enable us to de-serialize 
the state for
                        other guest-domains in xen and not have to wait for 
dom0 to do this

We want to just keep domain and hardware state
        -> Xen is supposedly completely to be exchanged
        -> We have to keep around the IOMMU page tables and do not touch them
                -> this might also come in handy for some newer UEFI boot 
related issues?
                -> We might have to go and re-inject certain interrupts
        -> do we need to dis-aggregate xenheap and domheap here?
                -> We are currently trying to avoid this

A key cornerstone for Live-update is guest transparent live migration
        -> This means we are using a well defined ABI for saving/restoring 
domain state
                -> We do only rely on domain state and no internal xen state
        -> The idea is to migrate the guest not from one machine to another (in 
space)
                but on the same machine from one hypervisor to another (in time)
        -> In addition we want to keep as much as possible in memory unchanged 
and feed
                this back to the target domain in order to save time
        -> This means we will need additional info on those memory areas and 
have to
                be super careful not to stomp over them while starting the 
target xen
        -> for live migration: domid is a problem in this case
                -> randomize and pray does not work on smaller fleets
                -> this is not a problem for live-update
                -> BUT: as a community we shoudl make this restriction go away

Exchanging the Hypervisor using kexec
        -> We have patches on upstream kexec-tools merged that enable 
multiboot2 for Xen
        -> We can now load the target xen binary to the crashdump region to not 
stomp
                over any valuable date we might need later
        -> But using the crashdump region for this has drawbacks when it comes 
to debugging
                and we might want to think about this later
                -> What happens when live-update goes wrong?
                -> Option: Increase Crashdump region size and partition it or 
have a separate
                        reserved live-update region to load the target xen into 
                -> Separate region or partitioned region is not a priority for 
V1 but should
                        be on the road map for future versions

Who serializes and deserializes domain state?
        -> dom0: This should work fine, but who does this for dom0 itself?
        -> Xen: This will need some more work, but might covered mostly by the 
dom0less effort on the arm side
                -> this will need some work for x86, but Stefano does not 
consider this a lot of work
        -> This would mean: serialize domain state into multiboot module and 
set domains
                up after kexecing xen in the dom0less manner
                -> make multiboot module general enough so we can tag it as 
boot/resume/create/etc.
                        -> this will also enable us to do per-guest feature 
enablement
                        -> finer granular than specifying on cmdline
                        -> cmdline stuff is mostly broken, needs to be fixed 
for nested either way
                        -> domain create flags is a mess

Live update instead of crashdump?
        -> Can we use such capabilities to recover from a crash be "restarting" 
xen on a crash?
                -> live updating into (the same) xen on crash
        -> crashing is a good mechanism because it happens if something is 
really broken and
                most likely not recoverable
        -> Live update should be a conscious process and not something you do 
as reaction to a crash
                -> something is really broken if we crash
                -> we should not proactively restart xen on crash
                        -> we might run into crash loops
        -> maybe this can be done in the future, but it is not changing 
anything for the design
                -> if anybody wants to wire this up once live update is there, 
that should not be too hard
                -> then you want to think about: scattering the domains to 
multiple other hosts to not keep
                        them on broken machines

We should use this opportunity to clean up certain parts of the code base:
        -> interface for domain information is a mess
                -> HVM and PV have some shared data but completely different 
ways of accessing it

Volume of patches:
        -> Live update: still developing, we do not know yet
        -> guest transparent live migration:
                -> We have roughly 100 patches over time
                -> we believe most of this has just to be cleaned up/squashed 
and
                        will land us at a reasonable much lower number
                -> this also needs 2-3 dom0 kernel patches

Summary of action items:
        -> coordinate with dom0less effort on what we can use and contribute 
there
        -> fix the domid clash problem
        -> Decision on usage of crash kernel area
        -> fix live migration patch set to include yet unsupported backends
                -> clean up the patch set
                -> upstream it

Longer term vision:

* Have a tiny hypervisor between Guest and Xen that handles the common cases
        -> this enables (almost) zero downtime for the guest
        -> the tiny hypervisor will maintain the guest while the underlying xen 
is kexecing into new build

* Somebody someday will want to get rid of the long tail of old xen versions in 
a fleet
        -> live patch old running versions with live update capability?
        -> crashdumping into a new hypervisor?
                -> "crazy idea" but this will likely come up at some point

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.