[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/msr: Unify the real {rd,wr}msr() paths in guest_{rd,wr}msr()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 23 Jul 2020 13:17:24 +0100
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Thu, 23 Jul 2020 12:17:54 +0000
  • Ironport-sdr: Jm5aV3RJiMdNLdR3BaTuRlDZxYoECC0CKx4201Q+yidhKwdtXRx/hL46lPo7mTckxAqCnyIcWV 5D9s0XvfvtQbarZTVYpl7xQsmRa4zbdT/boBSW5m7hpk037LoJyTpHm0EhQAPKNOYKq69dQ6sm V+N1ypeNZwCqMM1myOU2fdHsKf8yC+lQNqaX9CIiBsvHnU7EfQZGqWcpUQrOFIdr3l6M0qzyJ8 SgaM3pKphSbmeEBWWmA5ykZwM1Wg1Pz5Px0cFijxVK4ctXMj3zNfUu89P9YWm5U3VY92DxaBbM VcY=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 23/07/2020 12:15, Jan Beulich wrote:
> On 22.07.2020 12:55, Andrew Cooper wrote:
>> Both the read and write side have commonalities which can be abstracted away.
>> This also allows for additional safety in release builds, and slightly more
>> helpful diagnostics in debug builds.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>> CC: Wei Liu <wl@xxxxxxx>
>> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>
>> I'm not a massive fan of the global scope want_rdmsr_safe boolean, but I 
>> can't
>> think of a reasonable way to fix it without starting to use other
>> flexibiltiies offered to us by C99.
> I can't seem to be able to guess what C99 feature(s) you mean.
> If there are any that would help, why not use them?

This hunk:

@@ -154,7 +154,6 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr,
uint64_t *val)
     const struct cpuid_policy *cp = d->arch.cpuid;
     const struct msr_policy *mp = d->arch.msr;
     const struct vcpu_msrs *msrs = v->arch.msrs;
-    bool want_rdmsr_safe = false;
     int ret = X86EMUL_OKAY;
 
     switch ( msr )
@@ -303,6 +302,8 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr,
uint64_t *val)
 
     return ret;
 
+    bool want_rdmsr_safe = false;
+
  read_from_hw_safe:
     want_rdmsr_safe = true;
  read_from_hw:


Except that in our root Config.mk, we pass $(call
cc-option-add,CFLAGS,CC,-Wdeclaration-after-statement)  (and then
various bits of tools/ override to -Wno-declaration-after-statement).

Perhaps this is something we want to generally permit across our
codebase, seeing as some pieces already depend on it.

>> @@ -204,10 +205,9 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> *val)
>>           */
>>          if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_AMD)) ||
>>               !(boot_cpu_data.x86_vendor &
>> -               (X86_VENDOR_INTEL | X86_VENDOR_AMD)) ||
>> -             rdmsr_safe(MSR_AMD_PATCHLEVEL, *val) )
>> +               (X86_VENDOR_INTEL | X86_VENDOR_AMD)) )
>>              goto gp_fault;
>> -        break;
>> +        goto read_from_hw_safe;
> Above from here is a read from MSR_IA32_PLATFORM_ID - any reason
> it doesn't also get folded?

Oh - looks to be a rebasing error.  This patch is actually more than a
year old at this point.

>> @@ -278,7 +278,7 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> *val)
>>           */
>>  #ifdef CONFIG_HVM
>>          if ( v == current && is_hvm_domain(d) && v->arch.hvm.flag_dr_dirty )
>> -            rdmsrl(msr, *val);
>> +            goto read_from_hw;
> In the write path you also abstract out the check for v being current.
> Wouldn't this better be abstracted out here, too, as reading an actual
> MSR when not current isn't generally very helpful?

This is rather complicated to answer.

In the example of PLATFORM_ID above, which is consistent across the
entire system, and therefore it doesn't matter if we read it in
non-current context.

More generally however, the read and write paths truly are asymmetric
when it comes to their use in remote context.  Read is "I need this
value now", so always has to be of the form "if current do one thing,
else read from struct vcpu", whereas write is "always update struct
vcpu/etc, and let context switch handle getting it into hardware".


Then again, the more I think about this, the more I'm unsure if either
of the approaches here is ideal.

I think what this is going to need to morph into is a
get_reg()/set_reg() pair of helpers, which are first split between PV
and HVM, and then has further vmx/svm logic.  We're gaining an
increasing number of registers which might be RAM only (things emulated
for PV), or might be in the VMCB/VMCS (some even depending on hardware
generation), or might be in the MSR load lists (Intel Only) or might be
actually in hardware, or stale in hardware (VMLOAD/VMSAVE), and these
positions might vary on a per-VM or per context basis, and when we
finally get on to nested virt, might vary based on the settings of the
L1 hypervisor.

I'm wondering whether I should in fact withdraw this patch, and wait
until we've implemented guest_{rd,wr}msr() for some of the more
interesting MSRs, and see how the logic looks at that point.

>> @@ -493,8 +506,8 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> val)
>>                                 ? 0 : (msr - MSR_AMD64_DR1_ADDRESS_MASK + 1),
>>                                 ARRAY_SIZE(msrs->dr_mask))] = val;
>>  
>> -        if ( v == curr && (curr->arch.dr7 & DR7_ACTIVE_MASK) )
>> -            wrmsrl(msr, val);
>> +        if ( curr->arch.dr7 & DR7_ACTIVE_MASK )
>> +            goto maybe_write_to_hw;
>>          break;
> I have to admit that I'd find it more logical if v was now used
> here instead of curr.

Hmm true.

>
>> @@ -509,6 +522,23 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> val)
>>  
>>      return ret;
>>  
>> + maybe_write_to_hw:
>> +    /*
>> +     * All paths potentially updating a value in hardware need to check
>> +     * whether the call is in current context or not, so the logic is
>> +     * implemented here.  Remote context need do nothing more.
>> +     */
>> +    if ( v != curr || !wrmsr_safe(msr, val) )
>> +        return X86EMUL_OKAY;
>> +
>> +    /*
>> +     * Paths which end up here took a #GP fault in wrmsr_safe().  Something 
>> is
>> +     * broken with the logic above, so make it obvious in debug builds, and
>> +     * fail safe by handing #GP back to the guests in release builds.
>> +     */
>> +    gprintk(XENLOG_ERR, "Bad wrmsr %#x val %016"PRIx64"\n", msr, val);
> Didn't you indicate more than once that you dislike mixing 0x-
> prefixed and non-prefixed hex values in a single message?

Yes - my mistake.

> (Personally I'd simple drop the #, but I expect you to prefer it
> the other way around.)

In this case, I'm not overly fussed about the 0x.  It is clear from
context (WRMSR in the message, and the two numbers of exact width) that
we're using only hex.

> Also both here and in the read path I'm unconvinced of the
> "by handing #GP back" wording: When v != curr, no #GP fault can
> typically be handed anywhere. And even when v == curr it's still
> up to the caller to decide what to do. IOW how about "by
> suggesting to hand back #GP" or some such?

The overwhelming majority usecase is in current context, so I suppose it
is mostly true.

For the remote usecase, if this were to go wrong, something on the
context switch path would explode.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.