Xen project Mailing List

Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

To: George Dunlap <george.dunlap@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: "Wu, Feng" <feng.wu@xxxxxxxxx>

Date: Tue, 8 Mar 2016 13:10:47 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>

Delivery-date: Tue, 08 Mar 2016 13:12:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHReTJ11JqJFTCaAUulU/UisU2Xyp9Pgj9w

Thread-topic: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

> -----Original Message----- > From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] > Sent: Tuesday, March 8, 2016 8:02 PM > To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; George Dunlap > <George.Dunlap@xxxxxxxxxxxxx> > Cc: Wu, Feng <feng.wu@xxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>; Keir > Fraser <keir@xxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Dario > Faggioli <dario.faggioli@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxx; Jan Beulich > <jbeulich@xxxxxxxx> > Subject: Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt > core logic handling > > On 07/03/16 15:53, Konrad Rzeszutek Wilk wrote: > > On Mon, Mar 07, 2016 at 11:21:33AM +0000, George Dunlap wrote: > >> On Fri, Mar 4, 2016 at 10:00 PM, Konrad Rzeszutek Wilk > >> <konrad.wilk@xxxxxxxxxx> wrote: > >>>> +/* Handle VT-d posted-interrupt when VCPU is blocked. */ > >>>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >>>> +{ > >>>> + struct arch_vmx_struct *vmx, *tmp; > >>>> + spinlock_t *lock = &per_cpu(vmx_pi_blocking, > smp_processor_id()).lock; > >>>> + struct list_head *blocked_vcpus = > >>>> + &per_cpu(vmx_pi_blocking, smp_processor_id()).list; > >>>> + > >>>> + ack_APIC_irq(); > >>>> + this_cpu(irq_count)++; > >>>> + > >>>> + spin_lock(lock); > >>>> + > >>>> + /* > >>>> + * XXX: The length of the list depends on how many vCPU is current > >>>> + * blocked on this specific pCPU. This may hurt the interrupt > >>>> latency > >>>> + * if the list grows to too many entries. > >>>> + */ > >>>> + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list) > >>>> + { > >>> > >>> > >>> My recollection of the 'most-horrible' case of this being really bad is > >>> when > >>> the scheduler puts the vCPU0 and VCPU1 of the guest on the same pCPU (as > an example) > >>> and they round-robin all the time. > >>> > >>> <handwaving> > >>> Would it be perhaps possible to have an anti-affinity flag to deter the > >>> scheduler from this? That is whichever struct vcpu has 'anti-affinity' > >>> flag > >>> set - the scheduler will try as much as it can _to not_ schedule the > >>> 'struct > vcpu' > >>> if the previous 'struct vcpu' had this flag as well on this pCPU? > >> > >> Well having vcpus from the same guest on the same pcpu is problematic > >> for a number of reasons -- spinlocks first and foremost. So in > >> general trying to avoid that would be useful for most guests. > > > > PV ticketlocks in HVM and PV guests make this "manageable". > > > >> > >> The thing with scheduling is that it's a bit like economics: it seems > >> simple but it's actually not at all obvious what the emergent behavior > >> will be from adding a simple rule. :-) > > > > <nods> > >> > >> On the whole it seems unlikely that having two vcpus on a single pcpu > >> is a "stable" situation -- it's likely to be pretty transient, and > >> thus not have a major impact on performance. > > > > Except that we are concerned with it - in fact we are disabling this > > feature because it may happen. How do we make sure it does not happen > > all the time? Or at least do some back-off if things do get > > in this situation. > > So it's disabled by default based on a theoretical fear that it *may* > cause performance problems, but without any actual performance problems > having been observed? Yes, according to Jan's comments in previous thread, theoretically, the list may become very long, so he tend to make this feature default off now. > > It seems like there are a couple of ways we could approach this: > > 1. Try to optimize the reverse look-up code so that it's not a linear > linked list (getting rid of the theoretical fear) Good point. > > 2. Try to test engineered situations where we expect this to be a > problem, to see how big of a problem it is (proving the theory to be > accurate or inaccurate in this case) Maybe we can run a SMP guest with all the vcpus pinned to a dedicated pCPU, we can run some benchmark in the guest with VT-d PI and without VT-d PI, then see the performance difference between these two sceanrios. > > 3. Turn the feature on by default as soon as the 4.8 window opens up, > perhaps with some sort of a check that runs when in debug mode that > looks for the condition we're afraid of happening and BUG()s. If we run > a full development cycle without anyone hitting the bug in testing, then > we just leave the feature on. Maybe we can pre-define a max acceptable length of the list, if it really reach the number, print out a warning or something like that. However, how to decide the max length is a problem. May need more thinking. Thanks, Feng > > Then we'll only look at adding complexity to the scheduler if there's > actually a problem to solve. > > -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.