[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Design session report: Live-Updating Xen



On Thu, 2019-07-18 at 09:15 +0000, Jan Beulich wrote:
> On 17.07.2019 20:40, Andrew Cooper wrote:
> > On 17/07/2019 14:02, Jan Beulich wrote:
> > > On 17.07.2019 13:26, Andrew Cooper wrote:
> > > > We do not want to be grovelling around in the old Xen's
> > > > datastructures,
> > > > because that adds a binary A=>B translation which is
> > > > per-old-version-of-xen, meaning that you need a custom build of
> > > > each
> > > > target Xen which depends on the currently-running Xen, or have
> > > > to
> > > > maintain a matrix of old versions which will be dependent on
> > > > the local
> > > > changes, and therefore not suitable for upstream.
> > > 
> > > Now the question is what alternative you would suggest. By you
> > > saying "the pinned state lives in the migration stream", I assume
> > > you mean to imply that Dom0 state should be handed from old to
> > > new Xen via such a stream (minus raw data page contents)?
> > 
> > Yes, and this in explicitly identified in the bullet point saying
> > "We do
> > only rely on domain state and no internal xen state".
> > 
> > In practice, it is going to be far more efficient to have Xen
> > serialise/deserialise the domain register state etc, than to bounce
> > it
> > via hypercalls.  By the time you're doing that in Xen, adding dom0
> > as
> > well is trivial.
> 
> So I must be missing some context here: How could hypercalls come
> into
> the picture at all when it comes to "migrating" Dom0?

Xen will have to orchestrate the "save/restore" aspects of the domains
here.  The flow roughly will be:

1. One hypercall to load the new Xen binary in memory
2. Another hypercall to:
  a. Pause domains (including dom0),
  b. Mask interrupts,
  c. Serialize state,
  c. kexec into new Xen binary, and deserialize state

We had briefly considered Dom0 (or another stub domain) orchestrating
the whole serializing aspect here, but that's just too slow and will
create more problems in practice, so the idea was quickly dumped.

> 
> > > > The in-guest evtchn data structure will accumulate events just
> > > > like a
> > > > posted interrupt descriptor.  Real interrupts will queue in the
> > > > LAPIC
> > > > during the transition period.
> > > 
> > > Yes, that'll work as long as interrupts remain active from Xen's
> > > POV.
> > > But if there's concern about a blackout period for HVM/PVH, then
> > > surely there would also be such for PV.
> > 
> > The only fix for that is to reduce the length of the blackout
> > period.
> > We can't magically inject interrupts half way through the xen-to-
> > xen
> > transition, because we can't run vcpus at that point in time.
> 
> Hence David's proposal to "re-inject". We'd have to record them
> during
> the blackout period, and inject once Dom0 is all set up again.

We'll need both: as less downtime as possible, and to later re-inject
interrupts when domains continue execution.  The fewer reinjections we
have to do the better; but overall, the less visible this maintenance
activity the better as well.

> 
> > > > > Re-using large data structures (or arrays thereof) may also
> > > > > turn out
> > > > > useful in terms of latency until the new Xen actually becomes
> > > > > ready to
> > > > > resume.
> > > > 
> > > > When it comes to optimising the latency, there is a fair amount
> > > > we might
> > > > be able to do ahead of the critical region, but I still think
> > > > this would
> > > > be better done in terms of a "clean start" in the new Xen to
> > > > reduce
> > > > binary dependences.
> > > 
> > > Latency actually is only one aspect (albeit the larger the host,
> > > the more
> > > relevant it is). Sufficient memory to have both old and new
> > > copies of the
> > > data structures in place, plus the migration stream, is another.
> > > This
> > > would especially become relevant when even DomU-s were to remain
> > > in
> > > memory, rather than getting saved/restored.
> > 
> > But we're still talking about something which is on a multi-MB
> > scale,
> > rather than multi-GB scale.
> 
> On multi-TB systems frame_table[] is a multi-GB table. And with boot
> times
> often scaling (roughly) with system size, live updating is (I guess)
> all
> the more interesting on bigger systems.

We've not yet had to closely look at all these things yet - but this is
also perhaps the only point David and I keep quibbling about - will
there be any Xen state, and will we need to do anything about it?  The
ideal thing will be to have the new Xen start from scratch.  There's an
alternative idea here, though: *if* this is only during the setup phase
of the new Xen binary, we can perhaps get the allocations done before
pausing domains (i.e. in step 1 above).  That saves us time.  How this
works for memory, and how much free memory we can expect to have, is a
question that can only be answered at runtime.  Ideally we don't want
to leave such systems behind.  So, getting creative with
serializing/deserializing such state is something I totally anticipate
having to do.  But don't tell David I said it once again...


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.