[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] AMD/intremap: Prevent use of per-device vector maps until irq logic is fixed

On 03/06/13 16:01, Jan Beulich wrote:
>>>> On 03.06.13 at 16:35, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 03/06/13 15:07, Jan Beulich wrote:
>>>>>> On 31.05.13 at 22:04, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> In an effort to get AMD systems back to a non-regressed state, introduce a 
>>>> new
>>>> type of vector map called per-device-global.  This uses per-device vector 
>>>> maps
>>>> in the IOMMU, but uses a single used_vector map for the core IRQ logic.
>>> So what's the reason for not simply using OPT_IRQ_VECTOR_MAP_GLOBAL
>>> here?
>> Simply to make it obviously different until the core problem is fixed,
>> at which point I expect OPT_IRQ_VECTOR_MAP_PERDEV_GLOBAL to disappear.
> That's not a really good excuse...
>>>> This patch is intended to be removed as soon as the per-device logic is 
>>>> fixed
>>>> correctly.
>>> As a last resort thing this may be acceptable, but I'd much favor to
>>> fix this properly rather than hacking it like this.
>> While I agree that a proper fix would be good, what is going to happen
>> about 4.2 and 4.1 which wont have this new functionality backported? 
>> Futhermore, unless this new functionalty is going to race into 4.3 at
>> the last moment, 4.3 will also be in a regressed state.
> The new functionality (multi-vector MSI) doesn't necessarily need
> to be backported, but if the prereq change turns out to fix a bug,
> I don't see a reason not to try to backport that one.
> As to getting the patch in for 4.3 - George, would you revisit your
> opinion on the part of the multi-vector MSI series that originally
> I had hoped to get into 4.3 anyway?
>>>  Hence I'd really like
>>> to put up for discussion to instead use the patch[1] already posted
>>> as preparatory for the multi-vector MSI support doing away with the
>>> use of the vector for indexing the IRTE (and, in a second patch[2],
>>> the enforcement of OPT_IRQ_VECTOR_MAP_PERDEV).
>>> Also, overriding a command line request in the way you do is a
>>> no-go imo - even if this would cause [theoretical] problems,
>> Not theoretical.  I have reproduced the issue, albeit with a modified
>> Xen which deliberately limits the range of vectors considered for a
>> certain device, to increase the chances of a collision.
> You misunderstood my use of "theoretical": On a system with only
> MSI devices, no problem is to be expected afaict. Yet your change
> would affect those too.

Ah I see.

>>>  we
>>> ought to honor the request as long as we can't tell for sure that
>>> this is going to break the specific system. That's even more so
>>> since requesting per-device vector maps to be used on VT-d ought
>>> to yield exactly the same effect, yet you don't override the mode
>>> there.
>> Anyone using these vector maps with VT-d is mad.  I could tweak the
>> patch to not override the command line but simply warn when global is
>> chosen.
> Let's take a step back: What do we need those vector maps for in
> the first place, other than the disambiguation of AMD IOMMU
> IRTEs? If the answer is "nothing", then why was a command line
> option controlling this added in the first place? And in that case
> ripping them out the moment the patches mentioned above go in
> would seem like the right thing to do. George, I think you added all
> that - do you have any thoughts here?

As I remember, the original bug was that when migrating an interrupts in
Xen from one pcpu to another and choosing the same vector, the cleanup
code zapped the IRTE, causing loss of interrupts.  The used_vector logic
was added to prevent the interrupt migration code from choosing the same
vector on a different pcpu.

I cant precisely comment about the introduction of the command line
option.  With hindsight, I suspect it might have been a lack of
understanding the extent of the problem.  I was certainly quite new to
interrupt remapping at the time and did feel a little out of my depth.

>>> Furthermore, if only MSI-X devices currently suffer from this, the
>>> scalability effect this has (allowing nor more than about 200
>>> vectors to be in use even on huge systems) would call for limiting
>>> the effect to MSI-X capable devices (or perhaps even to devices
>>> actually using MSI-X).
>> As I said, this reverts to the behaviour before XSA-36, but without the
>> security issue of a single IOMMU interrupt remapping table.  Before
>> XSA-36, all AMD systems were limited in vector range because of the
>> global used_vector map.
> Right, so you'd trade one regression for another (less severe, but
> anyway).
> Jan

Absolutely, especially when it comes to trying to fix a regression we
have pushed out in a security fix.

Ideally a proper fix to MSI-X issue can be found, but failing a timely
fix, reverting to the pre XSA-36 behaviour but without the security
issue is a good solution.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.