[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about VPID during MOV-TO-CR3



>>> On 26.09.16 at 18:12, <tamas.lengyel@xxxxxxxxxxxx> wrote:
> On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>> On 23.09.16 at 22:45, <tamas.lengyel@xxxxxxxxxxxx> wrote:
>>> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel
>>> <tamas.lengyel@xxxxxxxxxxxx> wrote:
>>>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>>>> On 23.09.16 at 17:26, <tamas.lengyel@xxxxxxxxxxxx> wrote:
>>>>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@xxxxxxxxxxxx> wrote:
>>>>>>>> So I verified that when CPU-based load exiting is enabled, the TLB
>>>>>>>> flush here is critical. Without it the guest kernel crashes at random
>>>>>>>> points during boot. OTOH why does Xen trap every guest CR3 update
>>>>>>>> unconditionally? While we have features such as the vm_event/monitor
>>>>>>>> that may choose to subscribe to that event, Xen traps it even when
>>>>>>>> that is not in use. Is that trapping necessary for something else?
>>>>>>>
>>>>>>> Where do you see this being unconditional? construct_vmcs()
>>>>>>> clearly avoids setting these intercepts when using EPT. Are you
>>>>>>> perhaps suffering from
>>>>>>>
>>>>>>>             /* Trap CR3 updates if CR3 memory events are enabled. */
>>>>>>>             if ( v->domain->arch.monitor.write_ctrlreg_enabled &
>>>>>>>                  monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) )
>>>>>>>                 v->arch.hvm_vmx.exec_control |= 
>>>>>>> CPU_BASED_CR3_LOAD_EXITING;
>>>>>>>
>>>>>>> in vmx_update_guest_cr()? That'll be rather something for you
>>>>>>> or Razvan to explain. Outside of nested VMX I don't see any
>>>>>>> other enabling of that intercept (didn't check AMD code on the
>>>>>>> assumption that you're working on Intel hardware).
>>>>>>
>>>>>> So there seems to be two separate paths that lead to the TLB flushing.
>>>>>> One is indeed the above case you cited when we enable CR3 monitoring
>>>>>> through the monitor interface. However, during domain boot I also see
>>>>>> this path being called that is not related to the
>>>>>> CPU_BASED_CR3_LOAD_EXITING:
>>>>>>
>>>>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3
>>>>>> (XEN) hap.c:701:d1v0 HAP update cr3 called
>>>>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3
>>> called
>>>>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000
>>>>>>
>>>>>> This path seems to de-activate once the domain is fully booted.
>>>>>
>>>>> This late? According to the CR0 handling in
>>>>> vmx_update_guest_cr() I would understand it to be enabled only
>>>>> while the guest is still in real mode (and even then only on old
>>>>> hardware, i.e. without the Unrestricted Guest functionality).
>>>>>
>>>>
>>>> Right, with unrestricted guest support I would assume none of this
>>>> would get called - but it does, and quite frequently during domain
>>>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430.
>>>>
>>>
>>> So I experimented with selectively disabling the flushing such that
>>> it's done only when coming from a path other then CPU-based CR3 load
>>> exiting. I've added a bool to struct vcpu that gets set to 0 every
>>> time vmx_vmexit_handler is called, and only gets set to 1 when
>>> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr
>>> the flush only happens as such:
>>>
>>>         if ( !v->movtocr3 )
>>>             hvm_asid_flush_vcpu(v);
>>>
>>> In the guest I run a test application that allocates a page at a fixed
>>> VA, writes a magic value to it, and then keeps spinning on reading the
>>> magic value back from the page, checking if it's the same as
>>> originally supplied. I lunch this application twice with different
>>> magic values, so that if the TLB invalidation is an issue one of the
>>> test applications would read back the wrong magic value from the VA
>>> using a stale TLB entry. I've verified that same VA in the two
>>> applications point to different pages and that those PTEs are not
>>> marked global and no PCID is used.
>>>
>>> [  724] test (struct addr:ffff88003730f330). PGD: 0x3731f000
>>> VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0
>>> [  727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000
>>> VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0
>>
>> I'm surprised. As said before - a mov-to-CR3 cannot be emulated
>> without a minimal amount of flushing. No experiments whatsoever
>> are suitable to prove the contrary.
> 
> That's a pretty strong statement - can you tell me where in the SDM
> does it say that exactly? I've went through it couple times already
> and I can't find anything that explicitly says that the flushing has
> to be performed by the VMM when mov-to-CR3 trapping is enabled.

I though I had pointed you there already: Section "Instructions
that cause VM exits". There's nothing said about flushes, but that's
also not necessary: "... the instruction causing the VM exit does not
execute and no processor state is updated by the instruction." Plus
everything the sub-section "Relative Priority of Faults and VM Exits"
says.

> The
> closest thing I found was indicating the contrary. Furthermore, if the
> flushing is necessary, then how would you explain that there were no
> TLB mixups in the above experiment?

No idea. Perhaps there is some further flushing going on due to
other reasons?

>>> Both applications work as expected without the VPID flushing taking
>>> place. So at least for CPU-based CR3 load exiting it seems that this
>>> flush is not necessary. As for why this path gets called during domain
>>> boot when the CPU supports Unrestricted Guest mode and it is properly
>>> detecting when Xen boots, I'm not sure. However, as we use CPU-based
>>> CR3 load exiting quite often when doing VMI, I would prefer to disable
>>> this flushing at least for this case. Any thoughts?
>>
>> As said before - you'd better direct this question to the VMX
>> maintainers, and even better would be to first understand why
>> the intercept remains enabled in the first place. After all it's
>> quite obvious that most improvement can be expected from not
>> enabling it at all, whenever possible. Only if it needs to stay
>> enabled over extended periods of a guest's lifetime it would then
>> become interesting to see whether the emulation path can be
>> improved.
>>
> 
> To clarify - mov-to-CR3 trapping is _not_ enabled by default on a
> domain. I assumed it is the only path to vmx_update_guest_cr, but I
> now further verified that vmx_cr_access does not get called for a
> mov-to-CR3 when the domain boots, it only gets called when we enable
> it through the monitor system. There is another path leads to a call
> to vmx_update_guest_cr for updating CR3 when the domain boots which
> seems to require this flushing to happen. That other path I don't care
> about - although it's rather odd in itself as well. Now when the
> mov-to-CR3 path gets activated the flushing does not seem to be
> necessary as my experiment shows and it actually actively breaks
> architectural features (global pages and PCID).

Once again - it does not break anything. Performance aspects are
not architectural features. All you can say is that it makes these
extended features useless.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.