[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: Trap and yield on WFE instructions



On 16 July 2014 15:06, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Wed, 2014-07-16 at 14:34 +0530, Anup Patel wrote:
>> If we have a Guest/DomU with two or more of its VCPUs running
>> on same host CPU then it can quite likely happen that these
>> VCPUs fight for same spinlock and one of them will waste CPU
>> cycles in WFE instruction. This patch makes WFE instruction
>> trap for VCPU and forces VCPU to yield its timeslice.
>>
>> The KVM ARM/ARM64 also does similar thing for handling WFE
>> instructions. (Please refer,
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-November/006259.html)
>>
>> In general, this patch is more of an optimization for an
>> oversubscribed system having number of VCPUs more than
>> underlying host CPUs.
>
> Sounds good. Do you have any numbers for the improvement?

I did not try any benchmarks myself but hackbench shows good
improvement for KVM hence it is a worthy optimization for Xen too.

I found this change missing for Xen hence this patch.

>
>> Signed-off-by: Anup Patel <anup.patel@xxxxxxxxxx>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@xxxxxxxxxx>
>> Tested-by: Pranavkumar Sawargaonkar <pranavkumar@xxxxxxxxxx>
>> ---
>>  xen/arch/arm/traps.c |   27 ++++++++++++++++-----------
>>  1 file changed, 16 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index 686d8b7..bc9f67a 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -90,7 +90,7 @@ void __cpuinit init_traps(void)
>>
>>      /* Setup hypervisor traps */
>>      WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
>> -                 HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, HCR_EL2);
>> +                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, 
>> HCR_EL2);
>>      isb();
>>  }
>>
>> @@ -1803,16 +1803,21 @@ asmlinkage void do_trap_hypervisor(struct 
>> cpu_user_regs *regs)
>>              advance_pc(regs, hsr);
>>              return;
>>          }
>> -        /* at the moment we only trap WFI */
>> -        vcpu_block();
>> -        /* The ARM spec declares that even if local irqs are masked in
>> -         * the CPSR register, an irq should wake up a cpu from WFI anyway.
>> -         * For this reason we need to check for irqs that need delivery,
>> -         * ignoring the CPSR register, *after* calling SCHEDOP_block to
>> -         * avoid races with vgic_vcpu_inject_irq.
>> -         */
>> -        if ( local_events_need_delivery_nomask() )
>> -            vcpu_unblock(current);
>> +        if ( hsr.bits & 0x1 ) {
>
> Please can you add an appropriate struct hsr_wfe_wfi to union hsr and
> use that instead of opencoding bit patterns in hsr.bits.

Sure, I will update this and send it right away.

>
>> +            /* Yield the VCPU for WFE */
>> +            vcpu_force_reschedule(current);
>
> This is OK fine by me, but I'm wondering: Does the HW have any support
> for actually blocking until an event occurs in another vcpu (some sort
> of trap on sev)? I don't think so, nor am I really sure how such a h/w
> feature might work...

Typically, on HW the WFE instruction forces cores to go in low power
mode. The core will come out of low power mode due to some interrupt
or SEV executed from another core.

>
> Ian.
>

--
Anup

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.