[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [PATCH] xen: fix interrupt routing



On 14.06.2011, at 15:27, Stefano Stabellini wrote:

> On Tue, 14 Jun 2011, Alexander Graf wrote:
>>>>>>> static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
>>>>>>> {
>>>>>>>  PCII440FXState *d = opaque;
>>>>>>> @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char 
>>>>>>> *device_name,
>>>>>>>  d = pci_create_simple(b, 0, device_name);
>>>>>>>  *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d);
>>>>>>> 
>>>>>>> -    piix3 = DO_UPCAST(PIIX3State, dev,
>>>>>>> -                      pci_create_simple_multifunction(b, -1, true, 
>>>>>>> "PIIX3"));
>>>>>>> +    if (xen_enabled()) {
>>>>>>> +        piix3 = DO_UPCAST(PIIX3State, dev,
>>>>>>> +                pci_create_simple_multifunction(b, -1, true, 
>>>>>>> "PIIX3-xen"));
>>>>>>> +        pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
>>>>>>> +                piix3, XEN_PIIX_NUM_PIRQS);
>>>>>> 
>>>>>> But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the 
>>>>>> reason behind this change?
>>>>> 
>>>>> It is still a piix3, but also provides non-legacy interrupt links to the
>>>>> IO-APIC.
>>>>> The four pins of each PCI device on the bus not only are routed to the
>>>>> normal four pirqs (programmed writing to 0x60-0x63, see above) but also
>>>>> they are connected to the IO-APIC directly.
>>>>> These additional routes can only be discovered through ACPI, so you need
>>>>> matching ACPI tables. We used to build the old ACPI tables like this:
>>>>> 
>>>>> /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */
>>>>> printf("Name(PRTA, Package() {\n");
>>>>> for ( dev = 1; dev < 32; dev++ )
>>>>>  for ( intx = 0; intx < 4; intx++ ) /* INTA-D */
>>>>>      printf("Package(){0x%04xffff, %u, 0, %u},\n",
>>>>>             dev, intx, ((dev*4+dev/8+intx)&31)+16);
>>>>> printf("})\n");
>>>>> 
>>>> 
>>>> Interesting concept, but completely non-standard and very much
>>>> different from real hardware. Please at least add a comment there to
>>>> show readers that Xen is doing a hack which is not at all related to
>>>> how the PIIX really works.
>>> 
>>> Isn't this more a function of the "wires" on the motherboard than the
>>> PIIX specifically? i.e. this just encodes the permutation of the wires
>>> from the PCI slots into the IO-APIC input pins (bypassing the PIIX,
>>> which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)?
>> 
>> Interrupts with PCI work slightly different. PCI devices can map (themselves 
>> or by software) to one of 4 interrupt lines: INTA, INTB, INTC, INTD. These 
>> get converted using PCI host controller specific logic to 4 interrupt lines 
>> which then go into the IO-APIC.
>> 
>> The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could 
>> be 26 though.
> 
> The number of redirection entries in the IOAPIC can be discovered
> reading from the IOAPICVER register and it is a property of a specific
> model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more
> pins than the most popular IOAPIC used with PIIX3.

which means you're emulating hardware that never existed :).

> 
> 
>> I haven't seen a single case where PCI devices have a direct link to the 
>> IO-APIC. I also have not seen any PCI host controller that exports more than 
>> 4 interrupts. Giving each PCI device its own line, on top of that more than 
>> ever could be in real hardware, is a plain hack IMHO.
> 
> Actually this happens quite often: if I am not mistaken all the GSIs
> higher than 15 are actually the result of a direct connection between
> an interrupt source and the IOAPIC. I have several on my testboxes.

Yes. "Interrupt source" meaning a wire on the board. I haven't seen any 
situation so far where you get direct IO-APIC connections to PCI _device_ pins. 
You obviously get plenty connections to PCI _bus_ pins.

> Also give a look at the Intel Multiprocessor Specification, section
> 3.6.2.3: as you can see from the diagram in "Symmetric I/O Mode" all the
> interrupts are routed through the IOAPIC directly.
> 
> 
>> Did this really give you actual performance/latency/scalability gains? I 
>> still think for devices that matter, we should go with MSI rather than 
>> deriving from real hw.
>> 
> 
> Not all the operating systems support MSIs, it is nice to be able to
> avoid interrupt sharing without recurring to MSIs.

Yes and no. It's a tradeoff. If no interrupt sharing means that we emulate 
hardware that simply never could have existed the way we model it, I think it's 
a bad idea.

> Also this is how Xen has been working for more then 5 years in HVM mode,
> so this configuration is well tested and supported by most operating
> systems (at least all the ones we tried so far).

I'm fine with Xen breaking its own neck, as long as it doesn't affect non-Xen 
code paths. Just be aware that I'm not a huge fan of this approach :).

> In any case I think it is a good idea to add a comment to better explain
> what we are doing, see below.
> 
> 
> 
> commit 973bb091a967fdec37a1bc8fe30d46a483d2903d
> Author: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> Date:   Tue May 17 12:10:36 2011 +0000
> 
>    xen: fix interrupt routing
> 
>    - remove i440FX-xen and i440fx_write_config_xen
>    we don't need to intercept pci config writes to i440FX anymore;
> 
>    - introduce PIIX3-xen and piix3_write_config_xen
>    we do need to intercept pci config write to the PCI-ISA bridge to update
>    the PCI link routing;
> 
>    - set the number of PIIX3-xen interrupts line to 128;

I still find it unpretty and I'm pretty sure it's completely different from 
real hardware, but since Xen code is your call and this doesn't affect non-Xen 
workloads, I won't block it, unless someone else is very much opposed to it.

Please resend as proper patch.


Alex


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.