|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH, RFC] x86/HVM: batch vCPU wakeups
>>> On 09.09.14 at 23:29, <tim@xxxxxxx> wrote:
> At 09:33 +0100 on 09 Sep (1410251617), Jan Beulich wrote:
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -447,12 +447,30 @@ void vlapic_ipi(
>>
>> default: {
>> struct vcpu *v;
>> - for_each_vcpu ( vlapic_domain(vlapic), v )
>> + const struct domain *d = vlapic_domain(vlapic);
>> + bool_t batch = d->max_vcpus > 2 &&
>> + (short_hand
>> + ? short_hand != APIC_DEST_SELF
>> + : vlapic_x2apic_mode(vlapic)
>> + ? dest_mode ? hweight16(dest) > 1
>> + : dest == 0xffffffff
>> + : dest_mode
>> + ? hweight8(dest &
>> + GET_xAPIC_DEST_FIELD(
>> + vlapic_get_reg(vlapic,
>> + APIC_DFR))) > 1
>> + : dest == 0xff);
>
> Yoiks. Could you extract some of this into a helper function, say
> is_unicast_dest(), just for readability?
Sure.
>> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
>> {
>> - if ( !test_and_set_bit(nr, &softirq_pending(cpu))
>> - && (cpu != smp_processor_id()) )
>> + unsigned int this_cpu = smp_processor_id();
>
> The name clashes with the this_cpu() macro (obviously OK for compiling
> but harder to read) and can be dropped since...
>
>> +
>> + if ( test_and_set_bit(nr, &softirq_pending(cpu))
>> + || (cpu == this_cpu) )
>> + return;
>> +
>> + if ( !per_cpu(batching, this_cpu) || in_irq() )
>
> this_cpu(batching) is the preferred way of addressing this on Xen
> anyway - though I guess maybe it's _slightly_ less efficient to
> recalculate get_cpu_info() here than to indirect through the
> per_cpu_offsets[]? I haven't looked at the asm.
Right - multiple uses of this_cpu() in the same function are indeed
less efficient than latching smp_processor_id() into a local variable.
>> smp_send_event_check_cpu(cpu);
>> + else
>> + set_bit(nr, &per_cpu(batch_mask, this_cpu));
>> +}
>> +
>> +void cpu_raise_softirq_batch_begin(void)
>> +{
>> + ++per_cpu(batching, smp_processor_id());
>
> This one should definitely be using this_cpu().
Absolutely - unintended copy-and-paste effect.
>> +}
>> +
>> +void cpu_raise_softirq_batch_finish(void)
>> +{
>> + unsigned int cpu, this_cpu = smp_processor_id();
>> + cpumask_t *mask = &per_cpu(batch_mask, this_cpu);
>
> Again, this_cpu()?
No, as again the latched local variable yields better code.
>> + ASSERT(per_cpu(batching, this_cpu));
>> + for_each_cpu ( cpu, mask )
>> + if ( !softirq_pending(cpu) )
>> + cpumask_clear_cpu(cpu, mask);
>> + smp_send_event_check_mask(mask);
>> + cpumask_clear(mask);
>> + --per_cpu(batching, this_cpu);
>> }
>
> One other thing that occurs to me is that we might do the batching in
> the caller (by gathering a mask of pcpus in the vlapic code) but that
> sounds messy. And it's not like softirq is so particularly
> latency-sensitive that we'd want to avoid this much overhead in the
> common case. So on second thoughts this way is better. :)
In fact I had it that way first, and it looked ugly (apart from making
it more cumbersome for potential other use sites to switch to this
batching model).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |