[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/13] x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy



>>> On 04.07.18 at 20:47, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 04/07/18 11:16, Jan Beulich wrote:
>>>>> On 03.07.18 at 22:55, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> From: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
>>>
>>> This hypercall allows the toolstack to present one combined CPUID and MSR
>>> policy for a domain, which can be audited in one go by Xen, which is 
>>> necessary
>>> for correctness of the auditing.
>>>
>>> A stub x86_policies_are_compatible() function is introduced, although at
>>> present it will always fail the hypercall.
>>>
>>> The hypercall ABI allows for update of individual CPUID or MSR entries, so
>>> begins by duplicating the existing policy (for which a helper is 
>>> introduced),
>>> merging the toolstack data, then checking compatibility of the result.
>> This reads to me as if it was fine for the tool stack to supply only partial
>> data (or else there would be no need to merge anything). What's the
>> thinking behind this, rather than requiring complete sets of data to be
>> supplied?
> 
> Unless you have an identical build of Xen and libxc, the toolstack won't
> always provide the same leaves as Xen expects.  Such a restriction would
> massively hamper development.
> 
> More importantly, with the max_extd_leaf being vendor dependent (which
> is data earlier in the structure), the toolstack can't reasonably create
> a dataset which matches those expectations.  Also you'll get into
> problems when migrating from older hosts, and when levelling a policy
> down with a lower max_leaf.
> 
> The reason for having an architectural representation was deliberately
> to prohibit having a hypercall which was a memcpy of an unstable
> structure in the background, but with this design comes the fact that
> Xen must not expect to find an exact match of data.

Okay, I can see the point together with the migration aspect you've
mentioned in another reply on this series.

>>> --- a/xen/arch/x86/domctl.c
>>> +++ b/xen/arch/x86/domctl.c
>>> @@ -330,6 +330,71 @@ static int update_domain_cpuid_info(struct domain *d,
>>>      return 0;
>>>  }
>>>  
>>> +static int update_domain_cpumsr_policy(struct domain *d,
>>> +                                       xen_domctl_cpumsr_policy_t *xdpc)
>>> +{
>>> +    struct policy_group new = {};
>>> +    const struct policy_group *sys = is_pv_domain(d)
>>> +        ? &system_policies[XEN_SYSCTL_cpumsr_policy_pv_max]
>>> +        : &system_policies[XEN_SYSCTL_cpumsr_policy_hvm_max];
>>> +    struct vcpu *v = d->vcpu[0];
>>> +    int ret = -ENOMEM;
>>> +
>>> +    /* Initialise some help identifying auditing errors. */
>>> +    xdpc->err_leaf = xdpc->err_subleaf = XEN_CPUID_NO_SUBLEAF;
>>> +    xdpc->err_msr_idx = ~0;
>> I'm having trouble extracting information from the comment.
> 
> So am I...
> 
> These are the fields inside the domctl which try to provide some hint as
> to which piece of data is problematic, rather than leaving the user with
> simply "somthing is wrong".

Yeah, I did understand the purpose of the fields. I take it the
comment will be dropped or improved then.

>>> +    /* Start with existing domain's policies */
>>> +    if ( !(new.cp = xmemdup(d->arch.cpuid)) ||
>>> +         !(new.dp = xmemdup(d->arch.msr)) ||
>>> +         !(new.vp = xmemdup(v->arch.msr)) )
>>> +        goto out;
>>> +
>>> +    /* Merge the toolstack provided data. */
>>> +    if ( (ret = x86_cpuid_copy_from_buffer(
>>> +              new.cp, xdpc->cpuid_policy, xdpc->nr_leaves,
>>> +              &xdpc->err_leaf, &xdpc->err_subleaf)) )
>>> +        goto out;
>>> +
>>> +    if ( (ret = x86_msr_copy_from_buffer(
>>> +              new.dp, new.vp,
>>> +              xdpc->msr_policy, xdpc->nr_msrs, &xdpc->err_msr_idx)) )
>>> +        goto out;
>>> +
>>> +    /* Audit the combined dataset. */
>>> +    ret = x86_policies_are_compatible(sys, &new);
>>> +    if ( ret )
>>> +        goto out;
>> I'm afraid I don't follow - where's the merging? All you do is copy the
>> first so many entries coming from libxc, and using the later so many
>> entries from the previous policies. How's that going to provide a
>> complete set, rather than e.g. some duplicate entries and some
>> missing ones?
> 
> Consider the case where max_leaf is lower than Xen's maximum.  The data
> provided by the toolstack can be self consistent and complete without
> matching Xen's exact size of structure.

Right. The question on proper merging remains though: Shouldn't CPUID
entries be matched by leaf/subleaf, and MSR ones by MSR index?
Currently there's an implied ordering agreement of the arrays between
caller and callee.

>>> +    /*
>>> +     * Audit was successful.  Replace existing policies, leaving the old
>>> +     * policies to be freed.
>>> +     */
>>> +    SWAP(new.cp, d->arch.cpuid);
>>> +    SWAP(new.dp, d->arch.msr);
>>> +    SWAP(new.vp, v->arch.msr);
>>> +
>>> +    /* Merge the (now audited) vCPU MSRs into every other msr_vcpu_policy. 
>>> */
>>> +    for ( ; v; v = v->next_in_list )
>> This open-coded almost-for_each_domain() doesn't look very nice.
> 
> ITYM for_each_vcpu()

Oops, of course.

> And yes, but for_each_vcpu() is wrong to use here, and we don't have a
> for_each_vcpu_other_than_0() helper.

Perhaps still better to do

    for_each_vcpu(d, v)
    {
        if ( v->vcpu_id == 0 )
            continue;
        ...
    }

?

>>> +    {
>>> +        /* XXX - Figure out how to avoid a TOCTOU race here.  XLAT area? */
>>> +        if ( (ret = x86_msr_copy_from_buffer(
>>> +                  NULL, v->arch.msr, xdpc->msr_policy, xdpc->nr_msrs, 
>>> NULL)) )
>> Why can't you go from vCPU 0's v->arch.msr here, which is the copied-in
>> (and sanitized) representation already? Also, is it really a good idea to
>> assume all vCPU-s have the same policies?
> 
> There are multiple colliding issues which lead to this code, but as
> several people have pointed out, its probably over the top.
> 
> First, as to the same policy.  This hypercall can currently only be used
> before the vcpu has started executing.
> 
> As such, it is setting the init state of the MSRs from the guests point
> of view, and there is exactly one MSR I'm aware of which has an init
> value which depends on the core (that being APIC_BASE.BSP which can
> trivially be handled in Xen).  All other MSRs have identical init state
> AFAICT, and I don't want to create an interface which makes it easy to
> accidentally end up with wrong values.

So what about migration? There are certainly differing incoming values
there. Of course there's the MSRs restore record, but no atomic sanity
check between those and the policy here is possible.

> The re-de-serialising actually came from your suggested usecase to be
> able to increase the policy at runtime, .e.g. after a microcode update
> and hypervisor livepatch.  In that case, a partial set doesn't want to
> copy vCPU0's generally R/W MSRs over other vcpus, because that would be
> a bad thing.
> 
> However, these two points aren't compatible, so if we accept the "only
> before the domain has started running" aspect, then it should be safe to
> copy v0's policy over all other vcpus.

I think the description of the change wants to be extended to cover
this.

> Really, the problem here comes from (necessarily) how we will have to
> implement SGX_LC and the fact that the LC Hash MSRs may be read-only
> domain-wide state, or read-write per-vcpu state depending on a different
> MSR setting (MSR_FEAT_CTRL.SGX_LC).
> 
> I was initially hoping to have only msr_domain_policy poked via this
> interface, and msr_vcpu_policy poked exclusively via per-vcpu
> hypercalls.  Then again, at least the complexity is visible at this
> point, rather than at some point in the future when we need to try and
> retrofit it.

Oh, that's certainly very worthwhile; I wasn't even aware of this SGX
induced challenge.

>>> --- a/xen/include/xen/xmalloc.h
>>> +++ b/xen/include/xen/xmalloc.h
>>> @@ -13,6 +13,13 @@
>>>  #define xmalloc(_type) ((_type *)_xmalloc(sizeof(_type), 
>>> __alignof__(_type)))
>>>  #define xzalloc(_type) ((_type *)_xzalloc(sizeof(_type), 
>>> __alignof__(_type)))
>>>  
>>> +/* Allocate space for a typed object and copy an existing instance. */
>>> +#define xmemdup(ptr)                                    \
>>> +    ({  typeof(*ptr) *n_ = xmalloc(typeof(*ptr));       \
>>> +        if ( n_ )                                       \
>>> +            memcpy(n_, ptr, sizeof(*ptr));              \
>>> +        n_; })
>> Would be nice if this could handle input pointers to const-qualified types.
>> I vaguely recall having seen a solution to this recently, but I don't recall
>> where that was or how it looked like. Until then, may I suggest to use
>> void * instead, despite this opening the risk of type incompatibilities?
> 
> The only way I'm aware of which might fix the const qualified aspect is
> _Generic(), but ISTR you can't use typeof() inside.
> 
> As for xmemdup() specific, no - this will never want const qualified
> types.  The caller is always going to mutate the structure, or they
> wouldn't have dup'd the object to begin with.

I don't follow: Why wouldn't a caller possibly want to mutate a clone
of a const input structure? What is pointless to be const id the output
structure, yes, but typeof(*(ptr)) [note the missing parentheses in
your code, btw] includes the possible const from what the input points
to.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.