[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] DESIGN v2: CPUID part 3



On 07/05/2017 10:46 AM, Joao Martins wrote:
> Hey Andrew,
> 
> On 07/04/2017 03:55 PM, Andrew Cooper wrote:
>> Presented herewith is the a plan for the final part of CPUID work, which
>> primarily covers better Xen/Toolstack interaction for configuring the guests
>> CPUID policy.
>>
> Really nice write up, a few comments below.
> 
>> A PDF version of this document is available from:
>>
>> http://xenbits.xen.org/people/andrewcoop/cpuid-part-3-rev2.pdf
>>
>> Changes from v1:
>>  * Clarification of the interaction of emulated features
>>  * More information about the difference between max and default featuresets.
>>
>> ~Andrew
>>
>> -----8<-----

[snip]

>> # Proposal
>>
>> First and foremost, split the current **max\_policy** notion into separate
>> **max** and **default** policies.  This allows for the provision of features
>> which are unused by default, but may be opted in to, both at the hypervisor
>> level and the toolstack level.
>>
>> At the hypervisor level, **max** constitutes all the features Xen can use on
>> the current hardware, while **default** is the subset thereof which are
>> supported features, the features which the user has explicitly opted in to,
>> and excluding any features the user has explicitly opted out of.
>>
>> A new `cpuid=` command line option shall be introduced, whose internals are
>> generated automatically from the featureset ABI.  This means that all 
>> features
>> added to `include/public/arch-x86/cpufeatureset.h` automatically gain command
>> line control.  (RFC: The same top level option can probably be used for
>> non-feature CPUID data control, although I can't currently think of any cases
>> where this would be used Also find a sensible way to express 'available but
>> not to be used by Xen', as per the current `smep` and `smap` options.)
>>
>>
>> At the guest level, the **max** policy is conceptually unchanged.  It
>> constitutes all the features Xen is willing to offer to each type of guest on
>> the current hardware (including emulated features).  However, it shall 
>> instead
>> be derived from Xen's **default** host policy.  This is to ensure that
>> experimental hypervisor features must be opted in to at the Xen level before
>> they can be opted in to at the toolstack level.
>>
>> The guests **default** policy is then derived from its **max**.  This is
>> because there are some features which should always be explicitly opted in to
>> by the toolstack, such as emulated features which come with a security
>> trade-off, or for non-architectural features which may differ in
>> implementation in heterogeneous environments.
>>
>> All global policies (Xen and guest, max and default) shall be made available
>> to the toolstack, in a manner similar to the existing
>> _XEN\_SYSCTL\_get\_cpu\_featureset_ mechanism.  This allows decisions to be
>> taken which include all CPUID data, not just the feature bitmaps.
>>
>> New _XEN\_DOMCTL\_{get,set}\_cpuid\_policy_ hypercalls will be introduced,
>> which allows the toolstack to query and set the cpuid policy for a specific
>> domain.  It shall supersede _XEN\_DOMCTL\_set\_cpuid_, and shall fail if Xen
>> is unhappy with any aspect of the policy during auditing.  This provides
>> feedback to the user that a chosen combination will not work, rather than the
>> guest booting in an unexpected state.
>>
>> When a domain is initially created, the appropriate guests **default** policy
>> is duplicated for use.  When auditing, Xen shall audit the toolstacks
>> requested policy against the guests **max** policy.  This allows experimental
>> features or non-migration-safe features to be opted in to, without those
>> features being imposed upon all guests automatically.
>>
>> A guests CPUID policy shall be immutable after construction.  This better
>> matches real hardware, and simplifies the logic in Xen to translate policy
>> alterations into configuration changes.
>>
> 
> This appears to be a suitable abstraction even for higher level toolstacks
> (libxl). At least I can imagine libvirt fetching the PV/HVM max policy, and
> compare them between different servers when user computes the guest cpu config
> (the normalized one) and use the common denominator as the guest policy.
> Probably higher level toolstack could even use these said policies constructs
> and built the idea of models such that the user could easily choose one for a
> pool of hosts with different families. But the discussion here is more focused
> on xc <-> Xen so I won't clobber discussion with libxl remarks.
> 
>> (RFC: Decide exactly where to fit this.  _XEN\_DOMCTL\_max\_vcpus_ perhaps?)
>> The toolstack shall also have a mechanism to explicitly select topology
>> configuration for the guest, which primarily affects the virtual APIC ID
>> layout, and has a knock on effect for the APIC ID of the virtual IO-APIC.
>> Xen's auditing shall ensure that guests observe values consistent with the
>> guarantees made by the vendor manuals.
>>
> Why choose max_vcpus domctl?
> 
> With multiple sockets/nodes and having supported extended topology leaf the 
> APIC
> ID layout will change considerably requiring fixup if... say we set vNUMA (I
> know numa node != socket spec wise, but on the machines we have seen so far,
> it's a 1:1 mapping).
> 
> Another question since we are speaking about topology is would be: how do we
> make hvmloader aware of each the APIC_ID layout? Right now, it is too 
> hardcoded
> 2 * APIC_ID :( Probably a xenstore entry 'hvmloader/cputopology-threads' and
> 'hvmloader/cputopology-sockets' (or use vnuma_topo.nr_nodes for the latter)?
> 
> This all brings me to the question of perhaps a separate domctl?

"perhaps a separate domctl" as opposed to the max_vcpus domctl. Just to give
better context and clarify that of the sentence wasn't referring to hvmloader.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.