[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vmx: VT-d posted-interrupt core logic handling

> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Thursday, March 10, 2016 4:07 PM
> >>> On 10.03.16 at 06:09, <kevin.tian@xxxxxxxxx> wrote:
> > It's always good to have a clear definition to which extend a performance
> > issue would become a security risk. I saw 200us/500us used as example
> > in this thread, however no one can give an accrual criteria. In that case,
> > how do we call it a problem even when Feng collected some data? Based
> > on mindset from all maintainers?
> I think I've already made clear in previous comments that such
> measurements won't lead anywhere. What we need is a
> guarantee (by way of enforcement in source code) that the
> lists can't grow overly large, compared to the total load placed
> on the system.

Thanks for clarity here. 

> > I think a good way of looking at this is based on which capability is
> > impacted.
> > In this specific case the directly impacted metric is the interrupt delivery
> > latency. However today Xen is not RT-capable. Xen doesn't commit to
> > deliver a worst-case 10us interrupt latency. The whole interrupt delivery
> > path
> > (from Xen into Guest) has not been optimized yet, then there could be other
> > reasons impacting latency too beside the concern on this specific list walk.
> > There is no baseline worst-case data w/o PI. There is no final goal to hit.
> > There is no test case to measure.
> >
> > Then why blocking this feature due to this unmeasurable concern and why
> > not enabling it and then improving it later when it becomes a measurable
> > concern when Xen will commit a clear interrupt latency goal will be
> > committed
> > by Xen (at that time people working on that effort will have to identify all
> > kinds
> > of problems impacting interrupt latency and then can optimize together)?
> > People should understand possibly bad interrupt latency in extreme cases
> > like discussed in this thread (w/ or w/o PI), since Xen doesn't commit
> > anything
> > here.
> I've never made any reference to this being an interrupt latency
> issue; I think it was George who somehow implied this from earlier
> comments. Interrupt latency, at least generally, isn't a security
> concern (generally because of course latency can get so high that
> it might become a concern). All my previous remarks regarding the
> issue are solely from the common perspective of long running
> operations (which we've been dealing with outside of interrupt
> context in a variety of cases, as you may recall). Hence the purely

Yes, that concern makes sense.

> theoretical basis for some sort of measurement would be to
> determine how long a worst case list traversal would take. With
> "worst case" being derived from the theoretical limits the
> hypervisor implementation so far implies: 128 vCPU-s per domain
> (a limit which we sooner or later will need to lift, i.e. taking into
> consideration a larger value - like the 8k for PV guests - wouldn't
> hurt) by 32k domains per host, totaling to 4M possible list entries.
> Yes, it is obvious that this limit won't be reachable in practice, but
> no, any lower limit can't be guaranteed to be good enough.

Here do you think whether '4M' possible entries are 'overly large'
so we must have some enforcement in code, or still some experiments 
required to verify '4M' does been a problem (since total overhead 
depends on what we do with each entry)? If the latter what's the 
criteria to define it as a problem (e.g. 200us in total)?

There are many linked list usages today in Xen hypervisor, which
have different theoretical maximum possible number. The closest
one to PI might be the usage in tmem (pool->share_list) which is 
page based so could grow 'overly large'. Other examples are 
magnitude lower, e.g. s->ioreq_vcpu_list in ioreq server (which
could be 8K in above example), and d->arch.hvm_domain.msixtbl_list
in MSI-x virtualization (which could be 2^11 per spec). Do we
also want to create some artificial scenarios to examine them 
since based on actual operation K-level entries may also become
a problem? 

Just want to figure out how best we can solve all related linked-list 
usages in current hypervisor. 

> But I'm just now noticing this is the wrong thread to have this
> discussion in - George specifically branched off the thread with
> the new topic to separate the general discussion from the
> specific case of the criteria for default enabling VT-d PI. So let's
> please move this back to the other sub-thread (and I've
> changed to subject back to express this).

Sorry for cross-posting.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.