[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DESIGN] Feature Levelling improvements



On 22/06/15 20:18, Konrad Rzeszutek Wilk wrote:
> Thank you for posting this!
>
> Some comments below.
>
>> Design
>> ======
>>
>> `struct sysctl_physinfo.levelling_caps`
>> ---------------------------------------
>>
>> Xen shall gain a new physinfo field which reports the degree to which it can
>> influence `CPUID` executed by a PV guest.  This is a bitmap containing:
>>
>> * `faulting`
>>     * CPUID Faulting is available, and full control can be exercised.
>> * `mask_ecx`
>>     * Leaf 0x00000001.ECX
>> * `mask_edx`
>>     * Leaf 0x00000001.EDX
>> * `mask_extd_ecx`
>>     * Leaf 0x80000001.ECX
>> * `mask_extd_edx`
>>     * Leaf 0x80000001.EDX
>> * `mask_xsave_eax`
>>     * Leaf 0x0000000D[ECX=1].EAX
>> * `mask_therm_ecx`
>>     * Leaf 0x00000006.ECX
>> * `mask_l7s0_eax`
>>     * Leaf 0x00000007[ECX=0].EAX
>> * `mask_l7s0_ebx`
> Those 'l' look like '1' in the PDF.
>
> Can it be called something else?

If you can suggest a better name, yes.  As for now, these are the
variable names used in-tree (top of xen/arch/x86/cpu/amd.c)

>
>>     * Leaf 0x00000007[ECX=0].EBX
>>
>> At the time of writing, these are all the masking MSRs known by Xen.  The
>> bitmap shall be extended as new MSRs become available.
>>
>> New 'featureset' API for use by the toolstack
>> ---------------------------------------------
>>
>> A featureset is a defined as a collection of words covering the cpuid leaves
>> which report features to the caller.  It is variable length, and expected to
>> grow over time as processors gain more features, or Xen starts supporting
>> exposing more features to guests.
>>
>> At the time of writing, the leaves containing feature bits are:
>>
>> * 0x00000001.ECX
>> * 0x00000001.EDX
>> * 0x80000001.ECX
>> * 0x80000001.EDX
>> * 0x0000000D[ECX=1].EAX
>> * 0x00000007[ECX=0].EBX
>> * 0x00000006.EAX
>> * 0x00000006.ECX
>> * 0x0000000A.EAX
>> * 0x0000000A.EBX
>> * 0x0000000F[ECX=0].EDX
>> * 0x0000000F[ECX=1].EDX
>>
>> XEN_SYSCTL_get_featureset
>> -------------------------
>>
>> Xen shall on boot create a featureset for itself, and the maximum available
>> features for each type of guest, based on hardware features, command line
>> options etc.  A toolstack shall be able to query all of these.
> maximum available features?

Maximum set of features Xen is able to provide to particular guests on
this specific host.

>  As in two sets of features - one for
> PV and another for HVM. The PV being a subset of HVM (since it is more
> constrained)?

Three really (including the host featureset), but yes.

>
> Command line options being the same old ones (the cpuid_mask..?) and then
> more? Or just rewrite this to be:
>
> cpuid=mask_therm_ecx=[blahbla],mask_xsave_eax=[blahbal] ?

No.  What I meant by that is that something like "no-xsave" will turn
off whole swathes of features in all sets.

The maximum set of features available to Xen, PV and HVM guests alike
depends on the hardware, firmware settings and command line options
provided to Xen enabling or disabling functionality.

It is specifically not guaranteed to remain the same across reboot,
which is why Xen shall recalculate it on each boot.

>
>
>> Cpuid feature-verification library
>> ----------------------------------
>>
>> There shall be a new library (shared between Xen and libxc in the same
>> way as
>> libelf etc.) which can verify the a featureset.  In particular, it will
> s/ a //
>> confirm that no features are enabled without their dependent features.
> And presumarily can compare these features and do a and-subset (or an
> or-subset) ?

At the end of the day, these are just bitmaps with a (unknown but fixed)
integer length.

>
>> XEN_DOMCTL_set_cpuid
>> --------------------
>>
>> This is an existing hypercall.  Currently it just stashes the policy from
>> userspace.  It shall be extended to provide verification of the policy, and
>> reject attempts to advertise features which Xen is incapable of providing
>> (via hardware or emulation support).
> Where would be the code to trim the 'maximum available features' in the
> subsets (like PV) with some cpuid=X flags from user-space?

There is already code to do this in both libxl and libxc.  There will of
course be some changes as part of this work, but nothing major (I hope).

The important point is that the hypercall shall now check Xen's ability
to provide what the toolstack has requested, and say no if it can't. 
This will avoid the current situation which exists where the domain
cpuid code in Xen is always needing to second-guess what is present in
the domain policy, due to it usually being junk.

>
>
>> VCPU context switch
>> -------------------
>>
>> Xen shall be updated to lazily context switch all available masking
>> MSRs.  It
>> is noted that this shall incur a performance overhead if restricted
>> featuresets are assigned to PV guests, and _CPUID Faulting_ is not
>> available.
>>
>> It shall be the responsibility of the host administrator to avoid creating
>> such a scenario, if the performance overhead is a concern.
> .. and perhaps add warnings in the toolstack to tell the admin?

How and where would this surface?  xl/libxl is not designed to run the
system as a whole.

>
>>
>> Future work
>> ===========
>>
>> The above is a minimum quantity of work to support feature levelling, but
>> further problems exist.  They are acknowledged as being issues, but are
>> not in
>> scope for fixing as part of feature levelling.
>>
>> * Xen has no notion of per-cpu and per-package data in the cpuid policy.  In
>>   particular, this causes issues for VMs attempting to detect topology,
>> which
>>   find inconsistent/incorrect cache information.
>>
>> * In the case that `domain_cpuid()` can't locate a leaf in the topology, it
>>   will fall back to issuing a plain `CPUID` instruction.  This breaks VM
>>   encapsulation, as a VM which has migrated can observe differences which
>>   should be hidden.
>>
>> * There is currently a positioning issue with the domains cpuid policy.
>>   Verifying the register state requires the policy, but the policy is behind
>>   the register state in the migration stream.  The domains cpuid policy
>> should
>>   become an item in Xen's migration state for a VM.
>
> And potentially code in libxl to allow subset manipulation to allow
> leveling across different platforms. As in the common features would
> be exposed while all the other ones are masked? And I suppose some
> format to stash this so it can be ingested by the libxl tools?

libxl's knowledge of multiple platforms is precisely nothing.  xl knows
just enough to ssh and set up some pipes to push a VM through.

The domain configuration does have cpuid information in it.  That will
be sufficient, given these proposed changes, to prevent running the VM
on an incompatible destination.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.