[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 25 May 2018 12:48:54 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 25 May 2018 11:49:36 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 25/05/18 12:36, Jan Beulich wrote:
>>>> On 25.05.18 at 10:36, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 25/05/2018 08:49, Jan Beulich wrote:
>>>>>> On 22.05.18 at 13:20, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> @@ -1650,22 +1641,81 @@ static void vmx_update_guest_cr(struct vcpu *v, 
>> unsigned int cr,
>>>>  
>>>>  static void vmx_update_guest_efer(struct vcpu *v)
>>>>  {
>>>> -    unsigned long vm_entry_value;
>>>> +    unsigned long entry_ctls, guest_efer = v->arch.hvm_vcpu.guest_efer,
>>>> +        xen_efer = read_efer();
>>>> +
>>>> +    if ( paging_mode_shadow(v->domain) )
>>>> +    {
>>>> +        /*
>>>> +         * When using shadow pagetables, EFER.NX is a Xen-owned bit and 
>>>> is not
>>>> +         * under guest control.
>>>> +         */
>>>> +        guest_efer &= ~EFER_NX;
>>>> +        guest_efer |= xen_efer & EFER_NX;
>>>> +
>>>> +        /*
>>>> +         * At the time of writing (May 2018), the Intel SDM "VM Entry: 
>>>> Checks
>>>> +         * on Guest Control Registers, Debug Registers and MSRs" section 
>>>> says:
>>>> +         *
>>>> +         *  If the "Load IA32_EFER" VM-entry control is 1, the following
>>>> +         *  checks are performed on the field for the IA32_MSR:
>>>> +         *   - Bits reserved in the IA32_EFER MSR must be 0.
>>>> +         *   - Bit 10 (corresponding to IA32_EFER.LMA) must equal the 
>>>> value of
>>>> +         *     the "IA-32e mode guest" VM-entry control.  It must also be
>>>> +         *     identical to bit 8 (LME) if bit 31 in the CR0 field
>>>> +         *     (corresponding to CR0.PG) is 1.
>>>> +         *
>>>> +         * Experimentally what actually happens is:
>>>> +         *   - Checks for EFER.{LME,LMA} apply uniformly whether using the
>>>> +         *     GUEST_EFER VMCS controls, or MSR load/save lists.
>>>> +         *   - Without EPT, LME being different to LMA isn't tolerated by
>>>> +         *     hardware.  As writes to CR0 are intercepted, it is safe to
>>>> +         *     leave LME clear at this point, and fix up both LME and LMA 
>>>> when
>>>> +         *     CR0.PG is set.
>>>> +         */
>>>> +        if ( !(guest_efer & EFER_LMA) )
>>>> +            guest_efer &= ~EFER_LME;
>>>> +    }
>>> Why is this latter adjustments done only for shadow mode?
>> How should I go about making the comment clearer?
>>
>> When EPT is active, hardware is happy with LMA  != LME.  When EPT is
>> disabled, hardware strictly requires LME == LMA.
> Part of my problem may be that "Without EPT" can have two meanings:
> Hardware without EPT, or EPT disabled on otherwise capable hardware.

Ah ok.  Yes - I see the confusion.  I'll see about rewording it.

>
>> This particular condition occurs architecturally on the transition into
>> long mode, between setting LME and setting CR0.PG, and updating EFER
>> controls in the naive way results in a vmentry failure.
>>
>> Having spoken to Intel, they agree with my assessment that the docs
>> appear to be correct for Gen1 hardware, and stale for Gen2 hardware,
>> where fixing this was one of many parts of making Unrestricted Guest work.
> This suggests you mean the former, in which case the check really
> doesn't belong inside a paging_mode_shadow() conditional.

Whereas what is meant is the latter.  It depends on the EPT setting in
the VMCS, rather than whether the hardware is capable.  This is
presumably for backwards compatibility.

>
>>> After the above adjustments, when guest_efer still matches
>>> v->arch.hvm_vcpu.guest_efer, couldn't we disable the MSR read
>>> intercept?
>> In principle, yes.  We use load/save lists, as long as we remembered to
>> recalculate EFER every time CR0 gets modified in the shadow path.
>>
>> However, that would be a net performance penalty rather than benefit
>> (which is why I've gone to the effort of creating load-only lists).
>>
>> In practice, EFER is written at boot and not touched again.  Having
>> load/save logic might avoid these vmexits, but at the cost of almost
>> every other vmexit needing to keep the guest_efer in sync with the
>> load/save list or VMCS field.
> I can't seem to connect this to my question about MSR _read_ intercept.

Oh - so it doesn't.  I read that as the read/write intercept.

Yes - probably, although I'd have to double check how it interacts with
the introspection interception settings (and the answer is almost
certainly badly.  I've got a plan to fix this by maintaining separate
"who wants which MSR intercepted" state, and having a single
recalc_msr_intercept_bitmap() which runs on the hvm_resume() path after
any changes.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.