[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Friday, July 10, 2015 4:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; george.dunlap@xxxxxxxxxxxxx; Tian, Kevin; Zhang, Yang Z;
> xen-devel@xxxxxxxxxxxxx; keir@xxxxxxx
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> >>> On 10.07.15 at 09:29, <feng.wu@xxxxxxxxx> wrote:
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Friday, July 10, 2015 2:32 PM
> >> >>> On 10.07.15 at 08:21, <feng.wu@xxxxxxxxx> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> >> Sent: Thursday, July 09, 2015 3:26 PM
> >> >> >>> On 09.07.15 at 00:49, <kevin.tian@xxxxxxxxx> wrote:
> >> >> >>  From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> >> >> >> Sent: Wednesday, July 08, 2015 9:09 PM
> >> >> >> On 08/07/2015 13:46, Jan Beulich wrote:
> >> >> >> >>>> On 08.07.15 at 13:00, <kevin.tian@xxxxxxxxx> wrote:
> >> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table
> >> __initdata
> >> >> >> >>> vmx_function_table = {
> >> >> >> >>>      .enable_msr_exit_interception =
> >> >> vmx_enable_msr_exit_interception,
> >> >> >> >>>  };
> >> >> >> >>>
> >> >> >> >>> +/*
> >> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> >> >> >>> + */
> >> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> >> >> >>> +{
> >> >> >> >>> +    struct arch_vmx_struct *vmx;
> >> >> >> >>> +    unsigned int cpu = smp_processor_id();
> >> >> >> >>> +
> >> >> >> >>> +    spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> >> >> >>> +
> >> >> >> >>> +    /*
> >> >> >> >>> +     * FIXME: The length of the list depends on how many
> >> >> >> >>> +     * vCPU is current blocked on this specific pCPU.
> >> >> >> >>> +     * This may hurt the interrupt latency if the list
> >> >> >> >>> +     * grows to too many entries.
> >> >> >> >>> +     */
> >> >> >> >> let's go with this linked list first until a real issue is 
> >> >> >> >> identified.
> >> >> >> > This is exactly the way of thinking I dislike when it comes to code
> >> >> >> > that isn't intended to be experimental only: We shouldn't wait
> >> >> >> > for problems to surface when we already can see them. I.e. if
> >> >> >> > there are no plans to deal with this, I'd ask for the feature to be
> >> >> >> > off by default and be properly marked experimental in the
> >> >> >> > command line option documentation (so people know to stay
> >> >> >> > away from it).
> >> >> >>
> >> >> >> And in this specific case, there is no balancing of vcpus across the
> >> >> >> pcpus lists.
> >> >> >>
> >> >> >> One can construct a pathological case using pinning and pausing to
> get
> >> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> >> >> >> interrupts will exasperate the problem by staying on the list for 
> >> >> >> longer
> >> >> >> periods of time.
> >> >> >
> >> >> > In that extreme case I believe many contentions in other code paths
> will
> >> >> > be much larger than overhead caused by this structure limitation.
> >> >>
> >> >> Examples?
> >> >>
> >> >> >> IMO, the PI feature cannot be declared as done/supported with this
> bug
> >> >> >> remaining.  OTOH, it is fine to be experimental, and disabled by
> default
> >> >> >> for people who wish to experiment.
> >> >> >>
> >> >> >
> >> >> > Again, I don't expect to see it disabled as experimental. For good
> >> >> > production
> >> >> > environment where vcpus are well balanced and interrupt latency is
> >> >> > sensitive,
> >> >> > linked list should be efficient here. For bad environment like extreme
> > case
> >> >> > you raised, I don't know whether it really matters to just tune 
> >> >> > interrupt
> >> >> > path.
> >> >>
> >> >> Can you _guarantee_ that everything potentially leading to such a
> >> >> pathological situation is covered by XSA-77? And even if it is now,
> >> >> removing elements from the waiver list would become significantly
> >> >> more difficult if disconnected behavior like this one would need to
> >> >> be taken into account.
> >> >>
> >> >> Please understand that history has told us to be rather more careful
> >> >> than might seem necessary with this: ATS originally having been
> >> >> enabled by default is one bold example, and the recent flood of MSI
> >> >> related XSAs is another; I suppose I could find more. All affecting
> >> >> code originating from Intel, apparently written with only functionality
> >> >> in mind, while having left out (other than basic) security 
> >> >> considerations.
> >> >>
> >> >> IOW, with my committer role hat on, the feature is going to be
> >> >> experimental (and hence default off) unless the issue here gets
> >> >> addressed. And no, I cannot immediately suggest a good approach,
> >> >> and with all of the rush before the feature freeze I also can't justify
> >> >> taking a lot of time to think of options.
> >> >
> >> > Is it acceptable to you if I only add the blocked vcpus that has
> >> > assigned devices to the list? I think that should shorten the
> >> > length of the list.
> >>
> >> I actually implied this to be the case already, i.e.
> >> - if it's not, this needs to be fixed anyway,
> >> - it's not going to eliminate the concern (just think of a couple of
> >>   many-vCPU guests all having devices assigned).
> >
> > So how about allocating multiple wakeup vectors (says, 16, maybe
> > we can make this configurable) and multiplex them amongst all the
> > blocked vCPUs?
> 
> For such an approach to be effective, you'd need to know up front
> how many vCPU-s you may need to deal with, or allocate vectors
> on demand. Plus you'd need to convince us that spending additional
> vectors (which we're already short of on certain big systems) is the
> only viable solution to the issue (which I don't think it is).

Well, in that case, maybe you need to provide a better solution!

Thanks,
Feng

> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.