[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context



On 02.04.2020 11:58, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@xxxxxxxx>
>> Sent: 01 April 2020 15:51
>> To: Paul Durrant <paul@xxxxxxx>
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper 
>> <andrew.cooper3@xxxxxxxxxx>; George Dunlap
>> <george.dunlap@xxxxxxxxxx>; Ian Jackson <ian.jackson@xxxxxxxxxxxxx>; Julien 
>> Grall <julien@xxxxxxx>;
>> Stefano Stabellini <sstabellini@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>
>> Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for 
>> save/restore of 'domain' context
>>
>> On 27.03.2020 19:50, Paul Durrant wrote:
>>> Domain context is state held in the hypervisor that does not come under
>>> the category of 'HVM state' but is instead 'PV state' that is common
>>> between PV guests and enlightened HVM guests (i.e. those that have PV
>>> drivers) such as event channel state, grant entry state, etc.
>>
>> Without looking at the patch details yet, I'm having some difficulty
>> understanding how this is going to work in a safe/correct manner. I
>> suppose for LU the system is in a frozen enough state that
>> snapshotting and copying state like this is okay, but ...
>>
>>> To allow enlightened HVM guests to be migrated without their co-operation
>>> it will be necessary to transfer such state along with the domain's
>>> memory image, architectural state, etc. This framework is introduced for
>>> that purpose.
>>>
>>> This patch adds the new public header and the low level implementation,
>>> entered via the domain_save() or domain_load() functions. Subsequent
>>> patches will introduce other parts of the framwork, and code that will
>>> make use of it within the current version of the libxc migration stream.
>>
>> ... here you suggest (and patch 5 appears to match this) that this
>> is going to be used even in "normal" migration streams.
> 
> Well, 'transparent' (or non-cooperative) migration will only work in some 
> cases but it definitely does work.
> 
>> All of the
>> items named are communication vehicles, and hence there are always
>> two sides that can influence the state. For event channels, the
>> other side (which isn't paused) or the hardware (for passed through
>> devices) might signal them, or it (just the former obviously) could
>> close their end, resulting in a state change also for the domain
>> being migrated. If this happens after the snapshot was taken, the
>> state change is lost.
> 
> Indeed, which is why we *do* rely on co-operation from the other end
> of the event channels in the migration case. In the initial case it
> is likely we'll veto transparent migration unless all event channels
> are connected to either dom0 or Xen.

Co-operation for "normal" migration, iirc, consists of tearing down
and re-establishing everything. There's simply no risk of losing e.g.
events this way.

>> Otoh I'm sure the case was considered, so perhaps I'm simply missing
>> some crucial aspect (which then could do with spelling out in the
>> description of the cover letter).
>>
> 
> Does that need to be explained for a series that is just
> infrastructure?

I think so, yes - this infrastructure is pointless to introduce if
it doesn't allow fulfilling all requirements. Pointing at the design
doc (in the cover letter) may be enough if all aspects are covered
by what's there. I wouldn't have assumed using this infrastructure
also for co-operative migration to also be mentioned there.

Considering the situation with event channels (all closed), doing
what you do for the shared info page is probably going to be fine;
large parts of it are in a known state (or need re-filling on the
destination) anyway. What other plans do you have for non-LU
migration wrt this new infrastructure? If the shared info page is
all that's going to get migrated with its help, I'd wonder whether
the infrastructure wasn't better conditional upon a LU config
option, and the shared info migration was left as it is now.

> The overall design doc is now committed in the repo (although may
> need some expansion in future) so I could point at that.
> I don't think I'm giving anything away when I say that EC2's
> downstream code simply (ab)uses HVM save records for transferring
> the extra state; all I'm trying to do here is create something
> cleaner onto which I can re-base and upstream the EC2 code.

That was my guess, indeed.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.