[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Oops in Dom0 kernel when eth link fails



Hi,

While running two xen machines with kernel 2.6.18-2 (the standard Xen kernels 
supplied by debian unstable) I get the following oops in the Dom0 kernel when 
the ethernet link changes from up to down:

BUG: unable to handle kernel NULL pointer dereference at virtual address 
00000000
 printing eip:
c02855ba
*pde = ma 00000000 pa fffff000
Oops: 0002 [#1]
SMP
Modules linked in: ip_vs_wrr ip_vs xt_physdev netconsole iptable_filter 
ip_tables x_tables bridge netloop drbd button ac battery loop shpchp 
pci_hotplug pcspkr serial_core serio_raw psmouse evdev tsdev ext3 jbd mbcache 
dm_mirror dm_snapshot dm_mod ide_cd cdrom generic usbhid cciss piix scsi_mod 
uhci_hcd ide_core bnx2 usbcore thermal processor fan
CPU:    0
EIP:    0061:[<c02855ba>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-2-xen-686 #1)
EIP is at iret_exc+0x883/0xbe6
eax: 00000000   ebx: 00000000   ecx: 00000007   edx: c0ca0000
esi: c0ca0018   edi: c06d1890   ebp: 0000004c   esp: c0315d0c
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, ti=c0314000 task=c02c9660 task.ti=c0314000)
Stack: 0000004c 000001d8 c0ca0000 c0227f6d c0ca0000 c06d1878 000001d8 00000000
       00000000 00000000 00000018 c06d1878 c71038ac 00000001 0000004c 000005dc
       c52fd53c 0000025f c02079fc 000001d8 c0315e38 00000224 c76fee80 0000022c
Call Trace:
 [<c0227f6d>] skb_copy_and_csum_bits+0x129/0x2a9
 [<c02079fc>] __alloc_skb+0x6c/0x70
 [<c02647a9>] icmp_glue_bits+0x1f/0x74
 [<c02496f8>] ip_append_data+0x5d1/0x942
 [<c026478a>] icmp_glue_bits+0x0/0x74
 [<c026467d>] icmp_push_reply+0x3d/0x14a
 [<c0243d86>] ip_route_output_flow+0x13/0x57
 [<c0264f6d>] icmp_send+0x2e7/0x350
 [<c012b60c>] run_posix_cpu_timers+0x1c/0x6bf
 [<c011495e>] rebalance_tick+0x116/0x2ae
 [<c0241b36>] ipv4_link_failure+0x14/0x3c
 [<c0262f1c>] arp_error_report+0x1c/0x24
 [<c0232c0d>] neigh_timer_handler+0x18e/0x24d
 [<c0232a7f>] neigh_timer_handler+0x0/0x24d
 [<c0121c28>] run_timer_softirq+0x101/0x15c
 [<c011de82>] __do_softirq+0x5e/0xc3
 [<c011df21>] do_softirq+0x3a/0x4a
 [<c01060c9>] do_IRQ+0x48/0x53
 [<c0206518>] evtchn_do_upcall+0x64/0x9b
 [<c01049d9>] hypervisor_callback+0x3d/0x48
 [<c01072c6>] raw_safe_halt+0x8c/0xaf
 [<c0102c63>] xen_idle+0x22/0x2e
 [<c0102d82>] cpu_idle+0x91/0xab
 [<c03196fe>] start_kernel+0x37a/0x381
Code: ff ff ff e9 a8 4f ef ff b8 f2 ff ff ff e9 c7 4f ef ff b8 f2 ff ff ff e9 
e7 4f ef ff 8b 3d 20 0b 36 c0 e9 ef 93 ef ff 8b 5c 24 20 <c7> 03 f2 ff ff ff 8b 
7c 24 14 8b 4c 24 18 31 c0 f3 aa e9 4b 0d
EIP: [<c02855ba>] iret_exc+0x883/0xbe6 SS:ESP 0069:c0315d0c
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Some details about the setup: The machines are linked by an ethernet 
cross-cable via eth1. eth0 on both machines links to the LAN where clients 
connect to a virtual IP address managed by heartbeat. Both machines run 1 DomU 
providing services. Data replication is done with drbd over the eth1 link. This 
is what happens:

- Both machines are running fine, one DomU per physical machine, load balanced.
- One of the machines has a (simulated) problem, (poweroff -f).
- The second machine takes over all DomU's. Then seconds later the above oops 
occurs and the second machine is also down. Not quite as intended :)

My guess is that this has to do with the eth1 ethernet link failing because of 
the cross-cable, but I could be wrong. The network driver used is bnx2, the 
network card is a 'Broadcom NetXtreme II BCM5708 1000Base-T (B1) PCI-X 64-bit 
133MHz'. I have tried to reproduce it on a non-xen kernel, but couldn't. Also 
someone suggested I disable tx checksumming in both DomU's, but that made no 
difference.

Below is some output of xm info and xm dmesg.

Xm info:
host                   : kalium
release                : 2.6.18-2-xen-686
version                : #1 SMP Thu Nov 9 00:21:32 UTC 2006
machine                : i686
nr_cpus                : 4
nr_nodes               : 1
sockets_per_node       : 1
cores_per_socket       : 2
threads_per_core       : 2
cpu_mhz                : 3200
hw_caps                : 
bfebfbff:20100000:00000000:00000180:0000e43d:00000000:00000001
total_memory           : 2047
free_memory            : 1379
xen_major              : 3
xen_minor              : 0
xen_extra              : .3-1
xen_caps               : xen-3.0-x86_32 hvm-3.0-x86_32
xen_pagesize           : 4096
platform_params        : virt_start=0xfc000000
xen_changeset          : Tue Oct 17 22:09:52 2006 +0100
cc_compiler            : gcc version 4.1.2 20061028 (prerelease) (Debian 
4.1.1-19)
cc_compile_by          : ultrotter
cc_compile_domain      : debian.org
cc_compile_date        : Thu Nov  2 20:28:13 CET 2006
xend_config_format     : 2

Xm dmesg:

 Xen version 3.0.3-1 (Debian 3.0.3-0-2) (ultrotter@xxxxxxxxxx) (gcc version 
4.1.2 20061028 (prerelease) (Debian 4.1.1-19)) Thu Nov  2 20:28:13 CET 2006
 Latest ChangeSet: Tue Oct 17 22:09:52 2006 +0100

(XEN) Command line: /boot/xen-3.0.3-1-i386.gz dom0_mem=128Mb
(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009f400 (usable)
(XEN)  000000000009f400 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000007ffc8000 (usable)
(XEN)  000000007ffc8000 - 000000007ffd0000 (ACPI data)
(XEN)  000000007ffd0000 - 0000000080000000 (reserved)
(XEN)  00000000fec00000 - 00000000fed00000 (reserved)
(XEN)  00000000fee00000 - 00000000fee10000 (reserved)
(XEN)  00000000ffc00000 - 0000000100000000 (reserved)
(XEN) System RAM: 2047MB (2096540kB)
(XEN) Xen heap: 10MB (10408kB)
(XEN) PAE disabled.
(XEN) found SMP MP-table at 000f4f80
(XEN) DMI 2.3 present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v002 HP                                    ) @ 0x000f4f00
(XEN) ACPI: XSDT (v001 HP     P58      0x00000002 Ò 0x0000162e) @ 0x7ffc8300
(XEN) ACPI: FADT (v003 HP     P58      0x00000002 Ò 0x0000162e) @ 0x7ffc8380
(XEN) ACPI: SPCR (v001 HP     SPCRRBSU 0x00000001 Ò 0x0000162e) @ 0x7ffc8100
(XEN) ACPI: MCFG (v001 HP     ProLiant 0x00000001  0x00000000) @ 0x7ffc8180
(XEN) ACPI: HPET (v001 HP     P58      0x00000002 Ò 0x0000162e) @ 0x7ffc81c0
(XEN) ACPI: SPMI (v005 HP     ProLiant 0x00000001 Ò 0x0000162e) @ 0x7ffc8200
(XEN) ACPI: MADT (v001 HP     00000083 0x00000002  0x00000000) @ 0x7ffc8240
(XEN) ACPI: DSDT (v001 HP         DSDT 0x00000001 INTL 0x20030228) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
(XEN) Processor #3 15:6 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 2 I/O APICs
(XEN) ACPI: HPET id: 0x10228201 base: 0xfed00000
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Initializing CPU#0
(XEN) Detected 3200.281 MHz processor.
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU0: Thermal monitoring enabled
(XEN) CPU0: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 1/2 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU1: Thermal monitoring enabled
(XEN) CPU1: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 2/1 eip 90000
(XEN) Initializing CPU#2
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#2.
(XEN) CPU2: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU2: Thermal monitoring enabled
(XEN) CPU2: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Booting processor 3/3 eip 90000
(XEN) Initializing CPU#3
(XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) VMXON is done
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#3.
(XEN) CPU3: Intel P4/Xeon Extended MCE MSRs (24) available
(XEN) CPU3: Thermal monitoring enabled
(XEN) CPU3: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 4 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 4 CPUs
(XEN) Machine check exception polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000001f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   03000000->04000000 (28672 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0100000->c038b874
(XEN)  Init. ramdisk: c038c000->c0eab200
(XEN)  Phys-Mach map: c0eac000->c0ecc000
(XEN)  Start info:    c0ecc000->c0ecc46c
(XEN)  Page tables:   c0ecd000->c0ed2000
(XEN)  Boot stack:    c0ed2000->c0ed3000
(XEN)  TOTAL:         c0000000->c1000000
(XEN)  ENTRY ADDRESS: c0100000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Initrd len 0xb1f200, start at 0xc038c000
(XEN) Scrubbing Free RAM: .....................done.
(XEN) Xen trace buffers: disabled
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to 
Xen).

Any clues to what's wrong here?

If more info is needed, please ask.
Thanks in advance.
Regards,
Ronald.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.