[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 for-4.15] x86/msr: introduce an option for compatible MSR behavior selection


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Tue, 9 Mar 2021 19:57:42 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Uvsyn4BQtuOsH1ezzYT6qfWEubjzVqiQLFtMWsBe5Gg=; b=ekmyCFYNxVbOpIoVpWWsemGJUe0yTtsn0MST6FWkhkPWvk7Y0EOLa1ZtlxmeNdNElBuyqc3Q8v8y7C/KKq20OWDLdTMyi3r+Qg7FAaYDSUGKhE1kQN3FYdjG6GQm/8dxsw6cT8kz1e/0mWZiSM4MGioBBvHBrxVCZJC+aogRhrpwrtEgLUw2GDC3YgzzXgFvB0WklbwUqSdwrAZGYRilZI33aI2gLZc6xTmyAyIavACFWGwa1LoNuFlqqmGzQuFeFFppoH/Z3VxY5TNQ7bAKOJIMwPG46yj2wXkUU1I7LvwJWRrCiQ3sTbhQXJfYU9kFHP8BTkA/O2gMeOo/TbwVMQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PTvAz2W8PrXZ1dfNX7Tiby1pZIQGZA9/WU4qezrys4ZFgLLN5c9+ellW8sYkUBIRo40CMoC38L54OLubz7MXrj+SO/wG3lrgpv+PmLeZadzm7qfLFSdAu636vK4JO+YmocZb2rdVNGAfpRJiCB93xTPFzOOgSYyyMt4r8q9TPhllw47idm8+AmZxeJACFzqZEebtEUVdxTcPI+NXFbpbaMEsxihLfbkQA505e4xoT/o6CElOngVWieHMiCNXAQLFdYC+7U37iToecBfeggFxNA+QzqlbX9Q/+SHZy2n+NaG3fbTXHkA+EnycMXAeJeZRO7GPO+ANCZw2lz2aYl4FFg==
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Christian Lindig <christian.lindig@xxxxxxxxxx>, David Scott <dave@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 09 Mar 2021 19:58:12 +0000
  • Ironport-sdr: Oz2MRHUGCKPRx62BpRwaKit0MPbMbrhCdqtR9oWlfaGACbELzMbKb6a4amctUv+kPdsN1xI/HJ wf4QaEDv025tlUgc+yRrylIMcBHY0yTGp1UP0UI6va71AubZixi7ZJ+Msy+ZVE8anVficjpBn1 0YzP2xTS2icgnaAEcsyULjzkFG41W/RIVCwlYi4PJWnGIl2yLh6m77qa+8Rq7buwKIvjO4BB2O 0XfALhMtNuHeN/7hbUgfKF9IM0HM1V/yMjji/q+kn+3X+fHcNE/Aq2H3A0aeTsnL9fIH7ZcL9o SRs=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09/03/2021 11:36, Jan Beulich wrote:
> On 09.03.2021 11:56, Roger Pau Monne wrote:
>> Introduce an option to allow selecting a behavior similar to the pre
>> Xen 4.15 one for accesses to MSRs not explicitly handled. Since commit
>> 84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly
>> handled by Xen result in the injection of a #GP to the guest. This
>> is a behavior change since previously a #GP was only injected if
>> accessing the MSR on the real hardware would also trigger a #GP, or if
>> the attempted to be set bits wouldn't match the hardware values (for
>> PV).
>>
>> This seems to be problematic for some guests, so introduce an option
>> to fallback to this kind of legacy behavior without leaking the
>> underlying MSR values to the guest.
>>
>> When the option is set, for both PV and HVM don't inject a #GP to the
>> guest on MSR read if reading the underlying MSR doesn't result in a
>> #GP, do the same for writes and simply discard the value to be written
>> on that case.
>>
>> Note that for guests restored or migrated from previous Xen versions
>> the option is enabled by default, in order to keep a compatible
>> MSR behavior. Such compatibility is done at the libxl layer, to avoid
>> higher-level toolstacks from having to know the details about this flag.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> I'm generally okay with this approach, but wouldn't want to give it
> my R-b until it's at least clear it's not entirely unacceptable to
> anyone else (Andrew in particular).

I'm broadly happy with this approach.  Some feedback just sent.

>  Couple of remarks:
>
>> Changes since v2:
>>  - Apply the option to both HVM and PV guest.
>>  - Handle both reads and writes.
> I take it that it's intentional that you didn't allow separate read
> and write control?

I think v3 is the correct approach.

This is strictly a backwards compatibility option.  Splitting read and
write just doubles the complexity the admin has wrestle with to recover
a legacy guest.

As we explicitly intend to retire the option in due course, this is very
much a "make my guest work until my upstream (either Xen or kernel) can
fix the bug".

>
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1795,6 +1795,7 @@ static int svm_msr_read_intercept(unsigned int msr, 
>> uint64_t *msr_content)
>>      const struct domain *d = v->domain;
>>      struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
>>      const struct nestedsvm *nsvm = &vcpu_nestedsvm(v);
>> +    uint64_t tmp;
>>  
>>      switch ( msr )
>>      {
> While to some degree a matter of taste, I think such variables needed
> only inside a switch() and not needing an initializer would better
> live in the respective switch()'s scope.

Actually, I was hoping to make a CODING_SYTLE change removing this as a
permitted construct.

Recent compilers have hardening features to automatically initialise all
stack variables, because even if your code isn't architecturally buggy,
an attacker can launch speculative attacks the stack rubble.

However, because of the way the compiler transform works, it cannot
tolerate this specific construct in a switch statement, as there is no
available execution to cope with the implicit "=0" or "=POISON".

Even within Xen, we don't have many examples of this construct AFAICT,
and there is a concrete security advantage to being able to support the
compiler hardening.

>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -875,6 +875,7 @@ static int read_msr(unsigned int reg, uint64_t *val,
>>      const struct domain *currd = curr->domain;
>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>      bool vpmu_msr = false;
>> +    uint64_t tmp;
>>      int ret;
>>  
>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>> @@ -986,6 +987,12 @@ static int read_msr(unsigned int reg, uint64_t *val,
>>          }
>>          /* fall through */
>>      default:
>> +        if ( currd->arch.msr_relaxed && !rdmsr_safe(reg, tmp) )
>> +        {
>> +            *val = 0;
>> +            return X86EMUL_OKAY;
>> +        }
>> +
>>          gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>          break;
>>  
>> @@ -1148,6 +1155,9 @@ static int write_msr(unsigned int reg, uint64_t val,
>>          }
>>          /* fall through */
>>      default:
>> +        if ( currd->arch.msr_relaxed && !rdmsr_safe(reg, val) )
>> +            return X86EMUL_OKAY;
>> +
>>          gdprintk(XENLOG_WARNING,
>>                   "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n",
>>                   reg, val);
> So what are your thoughts wrt my change to this file? Drop it
> altogether and require people to use this new option? Or do you
> see both coexist? In the latter case, since you had suggested
> that I drop the write side of my change - does your changing of
> the write path indicate you've changed your mind?

I don't think we should legitimise buggy PV behaviour, either by
codifying in the ABI, or providing a knob beyond this one.

A guest blindly stumbling forward with a missed #GP could very well
worse than crashing hard.

Case in point - the 4.15 behaviour spotted a very serious bug in NetBSD
where it tried writing MSR_PAT with its own choice (which wasn't the
same as Xen's choice).  The consequence of this bug is getting cache
attributes in pagetables wrong.

It is unfortunate that multiple bugs have combined to make this mess,
but every instance needs investigating and fixing.  Continuing to paper
over the hole doesn't help anyone in the long run.

~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.