Xen project Mailing List

Re: [Xen-devel] Design session report: Live-Updating Xen

To: Jan Beulich <JBeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 18 Jul 2019 14:09:50 +0200

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, dwmw2@xxxxxxxxxxxxx, Leonard Foerster <foersleo@xxxxxxxxxx>

Delivery-date: Thu, 18 Jul 2019 12:10:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, 2019-07-18 at 09:15 +0000, Jan Beulich wrote: > On 17.07.2019 20:40, Andrew Cooper wrote: > > On 17/07/2019 14:02, Jan Beulich wrote: > > > On 17.07.2019 13:26, Andrew Cooper wrote: > > > > We do not want to be grovelling around in the old Xen's > > > > datastructures, > > > > because that adds a binary A=>B translation which is > > > > per-old-version-of-xen, meaning that you need a custom build of > > > > each > > > > target Xen which depends on the currently-running Xen, or have > > > > to > > > > maintain a matrix of old versions which will be dependent on > > > > the local > > > > changes, and therefore not suitable for upstream. > > > > > > Now the question is what alternative you would suggest. By you > > > saying "the pinned state lives in the migration stream", I assume > > > you mean to imply that Dom0 state should be handed from old to > > > new Xen via such a stream (minus raw data page contents)? > > > > Yes, and this in explicitly identified in the bullet point saying > > "We do > > only rely on domain state and no internal xen state". > > > > In practice, it is going to be far more efficient to have Xen > > serialise/deserialise the domain register state etc, than to bounce > > it > > via hypercalls. By the time you're doing that in Xen, adding dom0 > > as > > well is trivial. > > So I must be missing some context here: How could hypercalls come > into > the picture at all when it comes to "migrating" Dom0? Xen will have to orchestrate the "save/restore" aspects of the domains here. The flow roughly will be: 1. One hypercall to load the new Xen binary in memory 2. Another hypercall to: a. Pause domains (including dom0), b. Mask interrupts, c. Serialize state, c. kexec into new Xen binary, and deserialize state We had briefly considered Dom0 (or another stub domain) orchestrating the whole serializing aspect here, but that's just too slow and will create more problems in practice, so the idea was quickly dumped. > > > > > The in-guest evtchn data structure will accumulate events just > > > > like a > > > > posted interrupt descriptor. Real interrupts will queue in the > > > > LAPIC > > > > during the transition period. > > > > > > Yes, that'll work as long as interrupts remain active from Xen's > > > POV. > > > But if there's concern about a blackout period for HVM/PVH, then > > > surely there would also be such for PV. > > > > The only fix for that is to reduce the length of the blackout > > period. > > We can't magically inject interrupts half way through the xen-to- > > xen > > transition, because we can't run vcpus at that point in time. > > Hence David's proposal to "re-inject". We'd have to record them > during > the blackout period, and inject once Dom0 is all set up again. We'll need both: as less downtime as possible, and to later re-inject interrupts when domains continue execution. The fewer reinjections we have to do the better; but overall, the less visible this maintenance activity the better as well. > > > > > > Re-using large data structures (or arrays thereof) may also > > > > > turn out > > > > > useful in terms of latency until the new Xen actually becomes > > > > > ready to > > > > > resume. > > > > > > > > When it comes to optimising the latency, there is a fair amount > > > > we might > > > > be able to do ahead of the critical region, but I still think > > > > this would > > > > be better done in terms of a "clean start" in the new Xen to > > > > reduce > > > > binary dependences. > > > > > > Latency actually is only one aspect (albeit the larger the host, > > > the more > > > relevant it is). Sufficient memory to have both old and new > > > copies of the > > > data structures in place, plus the migration stream, is another. > > > This > > > would especially become relevant when even DomU-s were to remain > > > in > > > memory, rather than getting saved/restored. > > > > But we're still talking about something which is on a multi-MB > > scale, > > rather than multi-GB scale. > > On multi-TB systems frame_table[] is a multi-GB table. And with boot > times > often scaling (roughly) with system size, live updating is (I guess) > all > the more interesting on bigger systems. We've not yet had to closely look at all these things yet - but this is also perhaps the only point David and I keep quibbling about - will there be any Xen state, and will we need to do anything about it? The ideal thing will be to have the new Xen start from scratch. There's an alternative idea here, though: *if* this is only during the setup phase of the new Xen binary, we can perhaps get the allocations done before pausing domains (i.e. in step 1 above). That saves us time. How this works for memory, and how much free memory we can expect to have, is a question that can only be answered at runtime. Ideally we don't want to leave such systems behind. So, getting creative with serializing/deserializing such state is something I totally anticipate having to do. But don't tell David I said it once again... _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.