Xen project Mailing List

Re: [Xen-devel] [PATCH 1/4] VMX: streamline entry.S code

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 26 Aug 2013 12:01:43 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Eddie Dong <eddie.dong@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>

Delivery-date: Mon, 26 Aug 2013 11:02:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 26.08.13 at 12:44, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> +ENTRY(vmx_asm_vmexit_handler) >> push %rdi >> push %rsi >> push %rdx >> push %rcx >> push %rax >> + mov %cr2,%rax > > I presume this is a long latency instruction. Do you have a source of > numbers for this? (more for interest, as I can easily accept that it > would be a longer operation than the surrounding ones) I don't think I ever saw any precise numbers for control register accesses for anything newer than 386es or maybe 486es. But control register accesses are slow /not the least because serializing). >> push %r8 >> push %r9 >> push %r10 >> push %r11 >> push %rbx >> + GET_CURRENT(%rbx) > > This seems a little less obvious. I presume you are just breaking true > read-after-write data hazard on %rbx ? No, this is to hide the latency between loading %rbx and use of it in the address of a memory access. >> -.globl vmx_asm_do_vmentry >> -vmx_asm_do_vmentry: > > If you move the ENTRY(vmx_asm_do_vmentry) up from below, you should be > able to completely drop the jmp in it. That would be possible, at the expense of added padding. I prefer it the way it is now, as vmx_asm_do_vmentry is not performance critical (as being used exactly once per HVM vCPU). > However... > >> +.Lvmx_do_vmentry: >> call vmx_intr_assist >> call nvmx_switch_guest >> ASSERT_NOT_IN_ATOMIC >> >> - GET_CURRENT(%rbx) >> - cli > > The movement of this cli indicates a possible issue. > > If we have softirqs pending, we jump to .Lvmx_process_softirqs, which > calls do_softirq, and then jumps back to the top of .Lvmx_do_vmentry, > which reruns the top of do_vmentry with interrupts now enabled. That was this way already before. The "cli" got moved only past some address calculation (which clearly doesn't need to be done with interrupts disabled). > First of all, I cant see anything in vmx_intr_assist or > nvmx_switch_guest which should require calling multiple times on a > vmentry. They are also expecting to be called with interrupts disabled > (although I cant spot anything obvious in the callpath which would be > affected). And both of these functions had been called before disabling interrupts. >> - cmpb $0,VCPU_vmx_launched(%rbx) >> pop %r15 >> pop %r14 >> pop %r13 >> pop %r12 >> pop %rbp >> + mov %rax,%cr2 >> + cmpb $0,VCPU_vmx_launched(%rbx) > > Again, I presume the move of "mov %rax,%cr2" is about the %rax data hazard? The %cr2 write's move is indeed debatable - I tried to get it farther away from the producer of the data in %rax, but it's not clear whether that's very useful. The second purpose was to get something interleaved with the many "pop"s, so that the CPU can get busy other than just its memory load ports. If controversial I'm fine with undoing that change. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.