[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Issue with pv_ops Kernel 2.6.31.6 and Xen



Hi

First of all I have to state that I am neither a Kernel nor a Xen
developer. Nevertheless, while trying to use Kernel 2.6.31.6 from
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git as a Dom0
Kernel, I discovered an issue and searching the Internet for a long
time, I probably also found the cause. However, I won't be able to fix
it by myself :-(, so I am trying to share my knowledge with this list,
in the hope that the issue might gets fixed sometime :-)...
I will try to give you all information that seems relevant to me;
however, if it turns out I missed to give enough details about my system
(configuration), log files or anything else, I will be glad to provide
this information. Furthermore, I would also be happy to support
"testing" of potential patches if this is required. I post to this list
as this has been suggested at
http://wiki.xensource.com/xenwiki/XenParavirtOps (bottom of page). If I
am wrong, please give me a short hint so I wont bother you any longer...

Now, let's get into it...

About my system:
I am running Gentoo (10.0, server profile) on an Asus P2B-D motherboard
(PIIX4 chipset) with two PIII 500 MHz CPUs and 1G of RAM. The system
furthermore possesses 3 PCI network interfaces of chip type Realtek RLT
8139 (rlt8139too Kernel driver). Network interface to be used is eth0 (I
already tried  whether using another interface as eth0 would change
anything - without success :-( ).

The issue I have:
While Xen pv_ops Kernel 2.6.31.6 perfectly runs on bare metal, it fails
to get network connectivity when run on top of Xen 3.4.1 (Gentoo default
installation). Though the system seems to come up correctly at a first
sight and network interface is available (I can ping it locally), access
to network fails (I cannot ping other system in the network nor vice-versa).

What I discovered so far:
Consulting the boot messages within "dmesg", I discovered that ACPI SCI
fails to load when run on top of Xen, while this error is not happening
on bare metal.

With XEN:
*********
bio: create slab <bio-0> at 0
ACPI: SCI (IRQ20) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control
Interrupt handler 20090521 evevent-161
ACPI: Unable to start the ACPI Interpreter
------------[ cut here ]------------
WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c()
Hardware name: System Name
kobject: '<NULL>' (cf805ea0): is not initialized, yet kobject_put() is
being called.
Modules linked in:
Pid: 1, comm: swapper Tainted: G        W  2.6.31.6 #14
Call Trace:
 [<c043a2db>] warn_slowpath_common+0x60/0x90
 [<c043a33f>] warn_slowpath_fmt+0x24/0x27
 [<c05588cb>] kobject_put+0x27/0x3c
 [<c049e502>] kmem_cache_destroy+0x105/0x11b
 [<c058adc8>] acpi_os_delete_cache+0x8/0xc
 [<c05a6fe6>] acpi_ut_delete_caches+0xd/0x6b
 [<c05a77f7>] acpi_ut_subsystem_shutdown+0x87/0x90
 [<c0904837>] ? acpi_init+0x0/0x263
 [<c05a8067>] acpi_terminate+0x8/0x14
 [<c09049cb>] acpi_init+0x194/0x263
 [<c05f0e66>] ? __class_create+0x44/0x5e
 [<c09021c5>] ? fbmem_init+0x0/0x78
 [<c0904837>] ? acpi_init+0x0/0x263
 [<c0403051>] do_one_initcall+0x4c/0x13a
 [<c08e030d>] kernel_init+0x12c/0x17d
 [<c08e01e1>] ? kernel_init+0x0/0x17d
 [<c040ad17>] kernel_thread_helper+0x7/0x10
---[ end trace 4eaa2a86a8e2da23 ]---
------------[ cut here ]------------
WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c()
Hardware name: System Name
kobject: '<NULL>' (cf805f60): is not initialized, yet kobject_put() is
being called.
Modules linked in:
Pid: 1, comm: swapper Tainted: G        W  2.6.31.6 #14
Call Trace:
 [<c043a2db>] warn_slowpath_common+0x60/0x90
 [<c043a33f>] warn_slowpath_fmt+0x24/0x27
 [<c05588cb>] kobject_put+0x27/0x3c
 [<c049e502>] kmem_cache_destroy+0x105/0x11b
 [<c058adc8>] acpi_os_delete_cache+0x8/0xc
 [<c05a700e>] acpi_ut_delete_caches+0x35/0x6b
 [<c05a77f7>] acpi_ut_subsystem_shutdown+0x87/0x90
 [<c0904837>] ? acpi_init+0x0/0x263
 [<c05a8067>] acpi_terminate+0x8/0x14
 [<c09049cb>] acpi_init+0x194/0x263
 [<c05f0e66>] ? __class_create+0x44/0x5e
 [<c09021c5>] ? fbmem_init+0x0/0x78
 [<c0904837>] ? acpi_init+0x0/0x263
 [<c0403051>] do_one_initcall+0x4c/0x13a
 [<c08e030d>] kernel_init+0x12c/0x17d
 [<c08e01e1>] ? kernel_init+0x0/0x17d
 [<c040ad17>] kernel_thread_helper+0x7/0x10
---[ end trace 4eaa2a86a8e2da24 ]---
sync cpu 0 get result ffffffff max_id 0
Failed to sync pcpu 0
xenbus_probe_backend_init bus registered ok


Wihout Xen:
***********
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:00.0: reg 10 32bit mmio: [0xf8000000-0xfbffffff]
pci 0000:00:04.1: reg 20 io port: [0xb800-0xb80f]
pci 0000:00:04.2: reg 20 io port: [0xb400-0xb41f]
* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources
pci 0000:00:04.3: quirk: region e400-e43f claimed by PIIX4 ACPI
pci 0000:00:04.3: quirk: region e800-e80f claimed by PIIX4 SMB
pci 0000:00:04.3: PIIX4 devres B PIO at 0290-0297
pci 0000:00:09.0: reg 10 io port: [0xb000-0xb0ff]
pci 0000:00:09.0: reg 14 32bit mmio: [0xde800000-0xde8000ff]
pci 0000:00:09.0: reg 30 32bit mmio: [0x000000-0x00ffff]
pci 0000:00:0a.0: reg 10 io port: [0xa800-0xa8ff]
pci 0000:00:0a.0: reg 14 32bit mmio: [0xde000000-0xde0000ff]
pci 0000:00:0a.0: supports D1 D2
pci 0000:00:0a.0: PME# supported from D1 D2 D3hot
pci 0000:00:0a.0: PME# disabled
pci 0000:00:0b.0: reg 10 io port: [0xa400-0xa4ff]
pci 0000:00:0b.0: reg 14 32bit mmio: [0xdd800000-0xdd8000ff]
pci 0000:00:0b.0: supports D1 D2
pci 0000:00:0b.0: PME# supported from D1 D2 D3hot
pci 0000:00:0b.0: PME# disabled
pci 0000:01:00.0: reg 10 32bit mmio: [0xe0000000-0xe3ffffff]
pci 0000:01:00.0: reg 14 32bit mmio: [0xdf800000-0xdf87ffff]
pci 0000:01:00.0: reg 18 io port: [0xd800-0xd8ff]
pci 0000:01:00.0: reg 30 32bit mmio: [0xdf7e0000-0xdf7fffff]
pci 0000:01:00.0: supports D1 D2
pci 0000:00:01.0: bridge io port: [0xd000-0xdfff]
pci 0000:00:01.0: bridge 32bit mmio: [0xf4000000-0xf40fffff]
pci 0000:00:01.0: bridge 32bit mmio pref: [0xdf700000-0xe3ffffff]
pci_bus 0000:00: on NUMA node 0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 11 *12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 *4 5 6 7 9 10 11 12 14 15)
xenbus_probe_backend_init bus registered ok


Respective to the error, the /proc/interrupts tables were also different:

With XEN:
*********
           CPU0       CPU1
  1:        426          0  xen-pirq-ioapic-edge  i8042
  3:          0          0  xen-pirq-ioapic-edge  uhci_hcd:usb1
  4:          2          0  xen-pirq-ioapic-edge  serial
  8:          2          0  xen-pirq-ioapic-edge  rtc0
 12:          0          0  xen-pirq-ioapic-edge  eth0
 14:       4319          0  xen-pirq-ioapic-edge  ide0
 15:         42          0  xen-pirq-ioapic-edge  ide1
411:          0          0   xen-dyn-event     xenbus
412:          0        703   xen-dyn-ipi       callfuncsingle1
413:          0          0   xen-dyn-virq      debug1
414:          0          0   xen-dyn-ipi       callfunc1
415:          0      45622   xen-dyn-ipi       resched1
416:          0        311   xen-dyn-ipi       spinlock1
417:          0     153289   xen-dyn-virq      timer1
418:        550          0   xen-dyn-ipi       callfuncsingle0
419:          0          0   xen-dyn-virq      debug0
420:          0          0   xen-dyn-ipi       callfunc0
421:      18071          0   xen-dyn-ipi       resched0
422:        661          0   xen-dyn-ipi       spinlock0
423:     277476          0   xen-dyn-virq      timer0
NMI:          0          0   Non-maskable interrupts
LOC:          0          0   Local timer interrupts
SPU:          0          0   Spurious interrupts
CNT:          0          0   Performance counter interrupts
PND:          0          0   Performance pending work
RES:      18071      45622   Rescheduling interrupts
CAL:        550        703   Function call interrupts
TLB:          0          0   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        132        132   Machine check polls
ERR:          0
MIS:          0


Without XEN:
************
           CPU0       CPU1
  0:         46          0   IO-APIC-edge      timer
  1:       2567       4239   IO-APIC-edge      i8042
  6:          3          0   IO-APIC-edge      floppy
  8:          1          1   IO-APIC-edge      rtc0
 14:      28604      27089   IO-APIC-edge      ide0
 15:          0          0   IO-APIC-edge      ide1
 18:       1942       1978   IO-APIC-fasteoi   eth0
 20:          0          0   IO-APIC-fasteoi   acpi
NMI:          0          0   Non-maskable interrupts
LOC:    1097380    1052641   Local timer interrupts
SPU:          0          0   Spurious interrupts
CNT:          0          0   Performance counter interrupts
PND:          0          0   Performance pending work
RES:     105211     107135   Rescheduling interrupts
CAL:         16         20   Function call interrupts
TLB:       4542       4509   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        289        289   Machine check polls
ERR:          0
MIS:          0


Searching the Internet, I ran across different messages (i.e.
http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg26601.html)
mentioning that on motherboards with the PIIX4 chipset SCI interrupt is
hardwired to IRQ 9. However, on my system it is assigned IRQ 20 on bare
metal, and fails to be set to IRQ 20 on top of Xen (see extract above of
dmesg when run on top of Xen -> ACPI: SCI (IRQ20) allocation failed).

As I started wondering whether it would work with IRQ 9 and having no
knowledge of ACPI and interrupt handling in the Kernel, I badly fixed
the code of <Kernel-DIR>/drivers/acpi/osl.c in the following manner:

osl.c:391
*********
acpi_status
acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler handler,
                                  void *context)
{
        unsigned int irq;

        acpi_irq_stats_init();

        /*
         * Ignore the GSI from the core, and use the value in our copy
of the
         * FADT. It may not be the same if an interrupt source override
exists
         * for the SCI.
         */
        gsi = acpi_gbl_FADT.sci_interrupt;
        if (acpi_gsi_to_irq(gsi, &irq) < 0) {
                printk(KERN_ERR PREFIX "SCI (ACPI GSI %d) not registered\n",
                       gsi);
                return AE_OK;
        }
+       irq = 9;
        acpi_irq_handler = handler;
        acpi_irq_context = context;
        if (request_irq(irq, acpi_irq, IRQF_SHARED, "acpi", acpi_irq)) {
                printk(KERN_ERR PREFIX "SCI (IRQ%d) allocation
failed\n", irq);
                return AE_NOT_ACQUIRED;
        }
        acpi_irq_irq = irq;

        return AE_OK;
}


As you can see, I just "overwrote" the IRQ number somehow evaluated by
the system with IRQ 9, recompiled the Kernel and discovered(!) that
networking was now working, even within Xen (btw: it was still working
on bare metal).

Now I don't know why it is working with SCI mapped to IRQ 20 on bare
metal while SCI is supposed to be hardwired to IRQ 9, but the fact that
it works in both cases with IRQ 9 suggests me there is something "wrong"
or at least different when pv_ops Kernel 2.6.31.6 is run on top of Xen.
So someone somewhen might have a look at it, because that's where my
knowledge stops...

Thanks & regards,
Marcial



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.