Xen project Mailing List

Re: [Xen-devel] [Patch v5 2/5] x86/hpet: Use singe apic vector rather than irq_descs for HPET interrupts

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Fri, 22 Nov 2013 16:23:24 +0000

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>

Delivery-date: Fri, 22 Nov 2013 16:23:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 22/11/13 15:45, Jan Beulich wrote: >>>> On 14.11.13 at 17:01, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> The new logic is as follows: >> * A single high priority vector is allocated and uses on all cpus. > Does this really need to be a high priority one? I'd think we'd be > fine with the lowest priority one we can get, as we only need the > wakeup here if nothing else gets a CPU to wake up. Yes - absolutely. We cannot have an HPET interrupt lower priority than a guest line level interrupt. Another cpu could be registered with our HPET channel to be worken up, and we need to service them in a timely fashon. > >> +/* Wake up all cpus in the channel mask. Lock should be held. */ >> +static void hpet_wake_cpus(struct hpet_event_channel *ch) >> { >> - unsigned int cpu = smp_processor_id(); >> + cpuidle_wakeup_mwait(ch->cpumask); >> >> - if ( cpumask_test_and_clear_cpu(cpu, mask) ) >> - raise_softirq(TIMER_SOFTIRQ); >> - >> - cpuidle_wakeup_mwait(mask); >> - >> - if ( !cpumask_empty(mask) ) >> - cpumask_raise_softirq(mask, TIMER_SOFTIRQ); >> + if ( !cpumask_empty(ch->cpumask) ) >> + cpumask_raise_softirq(ch->cpumask, TIMER_SOFTIRQ); > Looks like the cpumask_empty() check isn't really needed here? cpuidle_wakeup_mwait(mask) will only wake cpus who have set their bit in cpuidle_mwait_flags. There might be cpus who have registered for waking up who have not yet set their bit in cpuidle_mwait_flags. Out of caution, I did by best to avoid playing with the guts of the timing code, as it seems quite fragile. > >> +/* HPET interrupt entry. This is set up as a high priority vector. */ >> +static void do_hpet_irq(struct cpu_user_regs *regs) >> { >> - struct hpet_event_channel *ch = (struct hpet_event_channel *)data; >> - >> - this_cpu(irq_count)--; >> + unsigned int cpu = smp_processor_id(); > This is being used just once, and hence rather pointless to have. Fair point - this was left over from a previous version of the function which did use cpu twice. > >> - desc->handler = &hpet_msi_type; >> - ret = request_irq(ch->msi.irq, hpet_interrupt_handler, "HPET", ch); >> - if ( ret >= 0 ) >> - ret = __hpet_setup_msi_irq(desc); >> + ret = hpet_setup_msi(ch); > Why did you keep this? With that function now being called from > hpet_broadcast_enter() I don't why you'd need this here (just > like you're removing it from hpet_broadcast_resume() without > replacement). Because I optimised functions in the wrong order to obviously spot this. As we leave the MSI masked, this call to setup can be dropped. I would however prefer not to manually inline hpet_setup_msi() into hpet_broadcast_enter(). The compiler can do that for me, and it saves making hpet_broadcast_enter() even more complicated. > >> @@ -574,6 +384,7 @@ void __init hpet_broadcast_init(void) >> /* Stop HPET legacy interrupts */ >> cfg &= ~HPET_CFG_LEGACY; >> n = num_hpets_used; >> + free_channels = (1U << n) - 1; > This is undefined when n == 32. Since n > 0, I'd suggest > > free_channels = (u32)~0 >> (32 - n); Ok > >> + ch = hpet_get_free_channel(); >> + >> + if ( ch ) >> + { >> + spin_lock(&ch->lock); >> + >> + /* This really should be an MSI channel by this point */ >> + ASSERT(!(ch->flags & HPET_EVT_LEGACY)); >> + >> + hpet_msi_mask(ch); >> + >> + this_cpu(hpet_channel) = ch; >> + ch->cpu = cpu; > I think even if benign, you should set ->cpu before storing into > hpet_channel. Ok > >> + cpumask_set_cpu(cpu, ch->cpumask); >> + >> + hpet_setup_msi(ch); >> + hpet_program_time(ch, deadline, NOW(), 1); >> + hpet_msi_unmask(ch); >> + >> + spin_unlock(&ch->lock); >> + } >> + else >> + { >> + /* TODO - this seems very ugly */ > And you nevertheless want it committed this way? Probably best the comment gets dropped. > >> + s_time_t fallback_deadline = STIME_MAX; >> + unsigned int i, fallback_idx = -1; >> + >> + for ( i = 0; i < num_hpets_used; ++i ) >> + { >> + ch = &hpet_events[i]; >> + spin_lock(&ch->lock); >> + >> + if ( ch->cpu == -1 ) >> + goto continue_search; >> + >> + /* This channel is going to expire far too early */ >> + if ( ch->next_event < deadline - MICROSECS(50) ) >> + goto continue_search; > So this is going to make us wake early. You remember that all this > code exists solely for power saving purposes? Iiuc the CPU will > then basically spin entering an idle state, until its actual wakeup > time is reached. What would you suggest instead? We dont really want to be waking up late. > >> + >> + /* We can deal with being woken with 50us to spare */ >> + if ( ch->next_event <= deadline ) >> + break; >> + >> + /* Otherwise record our best HPET to borrow. */ >> + if ( ch->next_event <= fallback_deadline ) >> + { >> + fallback_idx = i; >> + fallback_deadline = ch->next_event; >> + } >> + >> + continue_search: >> + spin_unlock(&ch->lock); >> + ch = NULL; >> + } >> + >> + if ( ch ) >> + { >> + /* Found HPET with an appropriate time. Request to be woken up >> */ >> + cpumask_set_cpu(cpu, ch->cpumask); >> + this_cpu(hpet_channel) = ch; >> + spin_unlock(&ch->lock); >> + } >> + else if ( fallback_deadline < STIME_MAX && fallback_deadline != -1 ) >> + { >> + /* >> + * Else we want to reprogram the fallback HPET sooner if >> possible, >> + * but with all the spinlocking, it just might have passed. >> + */ >> + ch = &hpet_events[fallback_idx]; >> + >> + spin_lock(&ch->lock); >> >> - if ( !(ch->flags & HPET_EVT_LEGACY) ) >> - hpet_attach_channel(cpu, ch); >> + if ( ch->cpu != -1 && ch->next_event == fallback_deadline ) > Can't this be "<= deadline", being quite a bit more flexible? This is a test for whether ch is the one I identified as the best inside the loop. There should be sufficient flexibility inside the loop. > >> + { >> + cpumask_set_cpu(cpu, ch->cpumask); >> + hpet_program_time(ch, deadline, NOW(), 1); >> + } >> + else >> + /* All else has failed. Wander the idle loop again */ >> + this_cpu(timer_deadline) = NOW() - 1; >> + >> + spin_unlock(&ch->lock); >> + } >> + else >> + /* All HPETs were too soon. Wander the idle loop again */ >> + this_cpu(timer_deadline) = NOW() - 1; > Here and above - what good will this do? Is this just in the > expectation that the next time round a free channel will likely be > found? If so, why can't you just go back to the start of the > function. Or a different HPET programmed to a different time which is now compatible with our wakeup timeframe. We cannot spin in this function, as we have interrupts disabled. > > Further - how do you see this going back to the idle loop? The > way this is called from acpi/cpu_idle.c, you'll end up entering > C3 anyway, with nothing going to wake you up. By proceeding to > the end of the function, you even manage to disable the LAPIC > timer. Hmm - I will need to revisit this logic then. > >> + /* If we own the channel, detach it */ >> + if ( ch->cpu == cpu ) >> + { >> + hpet_msi_mask(ch); >> + hpet_wake_cpus(ch); >> + ch->cpu = -1; >> + set_bit(ch->idx, &free_channels); > Wouldn't there be less cache line bouncing if the "free" flag was just > one of the channel flags? > > Jan Yes, but then finding a free channel would involve searching each hpet_channel rather than searching free_channels with ffs(). ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.