[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DESIGN] Feature Levelling improvements



On 16/06/15 16:33, Jan Beulich wrote:
>>>> On 16.06.15 at 12:50, <andrew.cooper3@xxxxxxxxxx> wrote:
>> How XenServer currently does levelling
>> ======================================
>>
>> The _Heterogeneous Pool Levelling_ support in XenServer appears to
>> predate the
>> libxc CPUID policy API, so does not currently use it.  The toolstack has a
>> table of CPU model numbers identifying whether levelling is supported.  It
>> then uses native `CPUID` instructions to look at the first four feature
>> masks,
>> and identifies the subset of features across the pool.
>> `cpuid_mask_{,extd_}{ecx,edx}` is then set on Xen's command line for
>> each host
>> in the pool, and all hosts rebooted.
>>
>> This has several limitations:
>>
>> * Xen and dom0 have a reduced feature set despite not needing to migrate
> I don't think Xen is affected by this, as it reads the CPUID bits
> before setting the masks (there are a few cpuid() invocations
> in "random" code, but I don't think these access maskable ones).

As part of existing levelling in XenServer, xsave (and in particular,
xsaveopt) are disabled, which does infringe on Xen's ability to do a
context switch in an efficient manner.

For the gory details (from
https://github.com/xenserver/xen-4.5.pg/blob/master/master/series ), the
following hacks are in place which have accumulated over time to "fix"
regressions in migration when it comes to exposed features:

fix-xsave-dependent-CPUID-bits-being-advertised-to-guests.patch
xen-dont-hide-vtx-or-svm.patch
xen-capture-boot-cpuid-info.patch
xen-apply-cpuid-mask-to-cpuid-faulting.patch
xen-disable-xsave.patch
xen-hide-fma4-on-amd-fam15h.patch
mixed-cpuid-before-mask.patch

All of which I will abolish with pleasure once these levelling
improvements are complete.

>
>> Notes and observations
>> ======================
>>
>> Experimentally, the masking MSRs can be context switched.  There is no
>> need to
>> force all PV guests to the same level, and no need to prevent dom0 or
>> Xen from
>> using certain features.  Context switching the masking MSRs will however
>> incur
>> an overhead, and should be avoided where possible.
>>
>> The toolstack needs to know how much control Xen has over VM features. 
>> In the
>> case that there are insufficient masking MSRs, and no faulting support is
>> present, a PV VM can still potentially be made safe to migrate by explicitly
>> disabling features on the kernel command line.
> That wouldn't help with user mode code, would it?

Generally not, but it does depend on whether user code queries cpuid
directly, or asks the OS for features.  This is already a last-ditch
effort at this point.

>
>> VCPU context switch
>> -------------------
>>
>> Xen shall be updated to lazily context switch all available masking
>> MSRs.  It
>> is noted that this shall incur a performance overhead if restricted
>> featuresets are assigned to PV guests, and _CPUID Faulting_ is not
>> available.
>>
>> It shall be the responsibility of the host administrator to avoid creating
>> such a scenario, if the performance overhead is a concern.
> Not sure how feasible this is: Even if you run all PV guests at equal
> feature levels, context switching between PV and non-PV guests
> would still incur overhead (unless you mean to run HVM/PVH ones
> with whatever masking is currently in place). Plus this still wouldn't
> deal with masks in place when Xen itself wants to look at any of the
> maskable ones, unless you intend to audit code to make sure no
> such uses exist (which - as per above - I suppose/hope to be the
> case).

This is partially linked with Future work b), to try and remove Xen's
reliance on cpuid after boot.

As domains shall inherit the default maximal policy, then moderated
downwards by DOMCTL_set_cpuid_policy, no maskable feature leaves will
fall though the existing domain_cpuid() implementation to a plain
`cpuid` instruction, so masking is save to leave in place when context
switching to an HVM VCPU.

I will audit the code to check that Xen is never checking feature leaves
at runtime.  It is never needed (and is specifically inefficient in a
nested case).

>
>> Future work
>> ===========
>>
>> The above is a minimum quantity of work to support feature levelling, but
>> further problems exist.  They are acknowledged as being issues, but are
>> not in
>> scope for fixing as part of feature levelling.
>>
>> * Xen has no notion of per-cpu and per-package data in the cpuid policy.  In
>>   particular, this causes issues for VMs attempting to detect topology,
>> which
>>   find inconsistent/incorrect cache information.
>>
>> * In the case that `domain_cpuid()` can't locate a leaf in the topology, it
>>   will fall back to issuing a plain `CPUID` instruction.  This breaks VM
>>   encapsulation, as a VM which has migrated can observe differences which
>>   should be hidden.
> I think this is actually something that (a) needs addressing not too
> far in the future and (b) reminds me that I didn't see any talk here
> regarding black vs white listing of features not explicitly known to
> Xen or the tool stack.

I have put quite a lot of thought towards a) and while it absolutely
does need addressing, it is substantially more work than just fixing the
levelling issues (which are my top priority, from XenServers point of
view).  I do hope to manage it as follow-on work, and will have it in
mind when fixing levelling.

The whitelist is implicit by virtue of Xen calculating the per-vm-type
maximum feature set which can be offered, then forcibly prevent the
toolstack from expanding on that.  There is possibly room for a command
line parameter to change the default behaviour.

Part of fixing both b) and a) involves Xen gaining a more structured
understanding of the cpuid leaves, and enforcing things like max_leaf,
which are logically limited by Xen's understanding of the leaves.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.