[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

To: "George Dunlap" <george.dunlap@xxxxxxxxxx>
From: "Jan Beulich" <JBeulich@xxxxxxxx>
Date: Wed, 09 Mar 2016 06:39:19 -0700
Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Feng Wu <feng.wu@xxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Wed, 09 Mar 2016 13:39:27 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 08.03.16 at 19:38, <george.dunlap@xxxxxxxxxx> wrote:
> Still -- I have a hard time constructing in my mind a scenario where
> huge numbers of idle vcpus for some reason decide to congregate on a
> single pcpu.
> 
> Suppose we had 1024 pcpus, and 1023 VMs each with 5 vcpus, of which 1
> was spinning at 100% and the other 4 were idle.  I'm not seeing a
> situation where any of the schedulers put all (1023*4) idle vcpus on a
> single pcpu.

As per my understanding idle vCPU-s don't get migrated at all.
And even if they do, their PI association with a pCPU doesn't
change (because that gets established once an for all at the
time the vCPU blocks).

> For the credit1 scheduler, I'm basically positive that it can't happen
> even once, even by chance.  You'd never be able to accrete more than a
> dozen vcpus on that one pcpu before they were stolen away.

Isn't stealing here happing only for runnable vCPU-s?

> And in any case, are you really going to have 1023 devices so that you
> can hand one to each of those 1023 guests?  Because it's only vcpus of
> VMs *which have a device assigned* that end up on the block list.

Who knows what people put in their (huge) systems, or by what
factor the VF/PF ratio will grow in the next few years?

> If I may go "meta" for a moment here -- this is exactly what I'm talking
> about with "Something bad may happen" being difficult to work with.
> Rather than you spelling out exactly the situation which you think may
> happen, (which I could then either accept or refute on its merits) *I*
> am now spending a lot of time and effort trying to imagine what
> situations you may be talking about and then refuting them myself.

I thought I was precise enough (without going into too much detail),
but looks like I wasn't.

1) vCPU1 blocks on pCPU1 (indefinitely for the purpose here)
2) vCPU2 gets migrated to pCPU1 and blocks (indefinitely ...)
...
n) vCPUn gets migrated to pCPU1 and blocks (indefinitely ...)
n+1) a PI wakeup interrupt arrives on pCPU1

In this consideration it doesn't matter whether the vCPU-s are all
from the same or different VMs. The sole requirement is that they
must satisfy the condition(s) to be put on the blocking list.

> If you have concerns, you need to make those concerns concrete, or at
> least set clear criteria for how someone could go about addressing your
> concerns.  And yes, it is *your* job, as the person doing the objecting
> (and even moreso as the x86 maintainer), to make your concerns explicit
> and/or set those criteria, and not Feng's job (or even my job) to try to
> guess what it is might make you happy.

I'm sorry, George, but no, I don't think this is how things should
work. If for a new feature to be enabled by default it is unclear
whether that puts the system at risk, it's the party suggesting the
default enabling to prove there's no such risk. We just can't allow
code in that sets us up for future security issues. If anything
that's what we should have learned from the various disasters in
the past (XSAVE enabling having been the first and foremost,
which by now I count 4 related XSAs for).

>> How many would be tolerable on a single list depends upon host
>> characteristics, so a fixed number won't do anyway. 
> 
> Sure, but if we can run through a list of 100 vcpus in 25us on a typical
> server, then we can be pretty certain 100 vcpus will never exceed 500us
> on basically any server.
> 
> On the other hand, if 50 vcpus takes 500us on whatever server Feng uses
> for his tests, then yes, we don't really have enough "slack" to be sure
> that we won't run into problems at some point.

I agree with such reasoning, except that we need to scale this
up. Unless (see above) there are reasons why the extraordinary
situation of a majority of all vCPU-s piling up in a single pCPU's
list cannot occur (not even theoretically), the counts to work with
are total vCPU counts that we can reasonably expect could be
placed on a huge system. Which is more likely to be thousands
than hundreds.

>> Hence I
>> think the better approach, instead of improving lookup, is to
>> distribute vCPU-s evenly across lists. Which in turn would likely
>> require those lists to no longer be tied to pCPU-s, an aspect I
>> had already suggested during review. As soon as distribution
>> would be reasonably even, the security concern would vanish:
>> Someone placing more vCPU-s on a host than that host can
>> handle is responsible for the consequences. Quite contrary to
>> someone placing more vCPU-s on a host than a single pCPU can
>> reasonably handle in an interrupt handler.
> 
> I don't really understand your suggestion.  The PI interrupt is
> necessarily tied to a specific pcpu; unless we start having multiple PI
> interrupts, we only have as many interrupts as we have pcpus, right?
> Are you saying that rather than put vcpus on the list of the pcpu it's
> running on, we should set the interrupt to that of an arbitrary pcpu
> that happens to have room on its list?

Ah, right, I think that limitation was named before, yet I've
forgotten about it again. But that only slightly alters the
suggestion: To distribute vCPU-s evenly would then require to
change their placement on the pCPU in the course of entering
blocked state.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- [Xen-devel] On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling")
  - From: George Dunlap
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: George Dunlap

References:
- [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: George Dunlap
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: George Dunlap
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: Wu, Feng
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: George Dunlap
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: Jan Beulich
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: George Dunlap
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: Jan Beulich
- Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
  - From: George Dunlap

Prev by Date: Re: [Xen-devel] Running Xen on Nvidia Jetson-TK1
Next by Date: Re: [Xen-devel] [V3] x86/xsaves: fix overwriting between non-lazy/lazy xsaves
Previous by thread: Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
Next by thread: Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.