[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table.c:583



Am 14.06.2010 14:26, schrieb Arnd Hannemann:
> Hi,
> 
> Am 14.06.2010 12:57, schrieb Stefano Stabellini:
>> On Mon, 14 Jun 2010, Arnd Hannemann wrote:
>>> Hi,
>>>
>>> we have regular but hard to reproduce (wait for a day or two starting 
>>> domUs) kernel panics (see below) with latest
>>> "xen/stable-2.6.32.x" git tree.
>>>
>>> Any idea, anyone?
>>>
>>
>> this CS from origin/xen/dom0/gntdev should fix your problem:
>>
>> sstabellini@kaball-desktop:~/xensource/linux-pvops-latest$ git show 
>> ad469f0da31bc16b945f9a06710b9d45434d0091
>> commit ad469f0da31bc16b945f9a06710b9d45434d0091
>> Author: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
>> Date:   Wed Jun 9 12:34:02 2010 -0700
>>
>>     xen/gntdev: use spinlocks rather than rwsem for locking
>>     
>>     The mmu notifier mechanism calls its callbacks with an rcu lock,
>>     which disables preemption.  This means we cannot use any blocking
>>     synchronization for locking.
>>     
>>     Convert all the rwsemas to plain spinlocks.  This requires that
>>     the memory allocation and copying to/from userspace be split
>>     from the actual datastructure updates since they can't be done
>>     under spinlock.
>>     
>>     Signed-off-by: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
>>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>>
> 
> Unfortunately, this patch does not seem to help. We get a very similar
> backtrace after one hour stress testing with a script starting and stopping
> domUs in a loop.
> 
> Maybe the problem is the hypervisor itself?
> We are currently using 4.0.1-rc2-pre (we updated from 4.0.0 because of what 
> we believed was the same
> problem, we had no working netconsole back then though).

FYI: I got lucky and reproduced the error within only 15 minutes and hypervisor 
version:

(XEN) Xen version 4.0.1-rc3-pre (samsel@xxxxxxxxxxxxx) (gcc version 4.4.3 
(Ubuntu 4.4.3-4ubuntu5) ) Mon Jun 14 12:43:49 CEST 2010
(XEN) Latest ChangeSet: Fri Jun 11 14:04:36 2010 +0100 21203:3903d95733f7

traceback below

Jun 14 14:38:14 vmhost2 [  201.636188] ------------[ cut here ]------------
Jun 14 14:38:14 vmhost2 [  201.636272] kernel BUG at 
drivers/xen/grant-table.c:583!
Jun 14 14:38:14 vmhost2 [  201.636345] invalid opcode: 0000 [#1]
Jun 14 14:38:14 vmhost2 SMP
Jun 14 14:38:14 vmhost2
Jun 14 14:38:14 vmhost2 [  201.636503] last sysfs file: 
/sys/devices/virtual/net/br0/bridge/topology_change_detected
Jun 14 14:38:14 vmhost2 [  201.636596] Modules linked in:
Jun 14 14:38:14 vmhost2 netconsole
Jun 14 14:38:14 vmhost2 raid0
Jun 14 14:38:14 vmhost2 md_mod
Jun 14 14:38:14 vmhost2 rtc_cmos
Jun 14 14:38:14 vmhost2 rtc_core
Jun 14 14:38:14 vmhost2 rtc_lib
Jun 14 14:38:14 vmhost2 thermal
Jun 14 14:38:14 vmhost2 processor
Jun 14 14:38:14 vmhost2 ipv6
Jun 14 14:38:14 vmhost2 thermal_sys
Jun 14 14:38:14 vmhost2 hwmon
Jun 14 14:38:14 vmhost2 button
Jun 14 14:38:14 vmhost2 acpi_processor
Jun 14 14:38:14 vmhost2 sr_mod
Jun 14 14:38:14 vmhost2 pl2303
Jun 14 14:38:14 vmhost2 cdrom
Jun 14 14:38:14 vmhost2 usbserial
Jun 14 14:38:14 vmhost2 evdev
Jun 14 14:38:14 vmhost2
Jun 14 14:38:14 vmhost2 [  201.637553]
Jun 14 14:38:14 vmhost2 [  201.637619] Pid: 0, comm: swapper Not tainted 
(2.6.32.15-xen4.0.0-dom0-stefano #2) System Product Name
Jun 14 14:38:14 vmhost2 [  201.637715] EIP: 0061:[<c120f170>] EFLAGS: 00010282 
CPU: 0
Jun 14 14:38:14 vmhost2 [  201.637792] EIP is at 
gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:38:14 vmhost2 [  201.637864] EAX: ffffffea EBX: c153be84 ECX: 
00000001 EDX: 00000000
Jun 14 14:38:14 vmhost2 [  201.637937] ESI: 00007ff0 EDI: 0000000f EBP: 
c290d120 ESP: c153be50
Jun 14 14:38:14 vmhost2 [  201.638022]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 
0069
Jun 14 14:38:14 vmhost2 [  201.638096] Process swapper (pid: 0, ti=c153a000 
task=c1543760 task.ti=c153a000)
Jun 14 14:38:14 vmhost2 [  201.638187] Stack:
Jun 14 14:38:14 vmhost2 [  201.638251]  00000000
Jun 14 14:38:14 vmhost2 00213e1c
Jun 14 14:38:14 vmhost2 c28f20c0
Jun 14 14:38:14 vmhost2 0002c189
Jun 14 14:38:14 vmhost2 ec189000
Jun 14 14:38:14 vmhost2 ecd95944
Jun 14 14:38:14 vmhost2 0000000f
Jun 14 14:38:14 vmhost2 ec189000
Jun 14 14:38:14 vmhost2
Jun 14 14:38:14 vmhost2 [  201.638634] <0>
Jun 14 14:38:14 vmhost2 00000000
Jun 14 14:38:14 vmhost2 eb406000
Jun 14 14:38:14 vmhost2 00000000
Jun 14 14:38:14 vmhost2 0000000f
Jun 14 14:38:14 vmhost2 ece40000
Jun 14 14:38:14 vmhost2 13e1c001
Jun 14 14:38:14 vmhost2 00000000
Jun 14 14:38:14 vmhost2 0002c189
Jun 14 14:38:14 vmhost2
Jun 14 14:38:14 vmhost2 [  201.639115] <0>
Jun 14 14:38:14 vmhost2 00000000
Jun 14 14:38:14 vmhost2 c1627a8c
Jun 14 14:38:14 vmhost2 c16277c8
Jun 14 14:38:14 vmhost2 c1627a8c
Jun 14 14:38:14 vmhost2 000068c4
Jun 14 14:38:14 vmhost2 c12200c1
Jun 14 14:38:14 vmhost2 00000000
Jun 14 14:38:14 vmhost2 ebce8000
Jun 14 14:38:14 vmhost2
Jun 14 14:38:14 vmhost2 [  201.639655] Call Trace:
Jun 14 14:38:14 vmhost2 [  201.639729]  [<c12200c1>] ? net_tx_action+0x1d1/0x9b0
Jun 14 14:38:14 vmhost2 [  201.639805]  [<c135e4e0>] ? process_backlog+0x90/0xa0
Jun 14 14:38:14 vmhost2 [  201.639882]  [<c103bc2e>] ? tasklet_action+0x9e/0xb0
Jun 14 14:38:14 vmhost2 [  201.639956]  [<c103c378>] ? __do_softirq+0x88/0x110
Jun 14 14:38:14 vmhost2 [  201.640032]  [<c1210057>] ? 
__xen_evtchn_do_upcall+0xd7/0x160
Jun 14 14:38:14 vmhost2 [  201.640108]  [<c103c43d>] ? do_softirq+0x3d/0x40
Jun 14 14:38:14 vmhost2 [  201.640184]  [<c121063a>] ? 
xen_evtchn_do_upcall+0x2a/0x40
Jun 14 14:38:14 vmhost2 [  201.640261]  [<c1009da7>] ? xen_do_upcall+0x7/0xc
Jun 14 14:38:14 vmhost2 [  201.640336]  [<c10013a7>] ? 
hypercall_page+0x3a7/0x1010
Jun 14 14:38:14 vmhost2 [  201.640411]  [<c10061ef>] ? xen_safe_halt+0xf/0x20
Jun 14 14:38:14 vmhost2 [  201.640486]  [<c100382c>] ? xen_idle+0x1c/0x30
Jun 14 14:38:14 vmhost2 [  201.640560]  [<c10081fa>] ? cpu_idle+0x3a/0x60
Jun 14 14:38:14 vmhost2 [  201.640635]  [<c15787ef>] ? start_kernel+0x2c6/0x2cb
Jun 14 14:38:14 vmhost2 [  201.640710]  [<c1578367>] ? 
unknown_bootoption+0x0/0x190
Jun 14 14:38:14 vmhost2 [  201.640786]  [<c157b0e6>] ? 
xen_start_kernel+0x624/0x62c
Jun 14 14:38:14 vmhost2 [  201.640857] Code:
Jun 14 14:38:14 vmhost2 8d
Jun 14 14:38:14 vmhost2 5c
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 34
Jun 14 14:38:14 vmhost2 c1
Jun 14 14:38:14 vmhost2 e0
Jun 14 14:38:14 vmhost2 0c
Jun 14 14:38:14 vmhost2 83
Jun 14 14:38:14 vmhost2 c8
Jun 14 14:38:14 vmhost2 01
Jun 14 14:38:14 vmhost2 89
Jun 14 14:38:14 vmhost2 44
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 34
Jun 14 14:38:14 vmhost2 8b
Jun 14 14:38:14 vmhost2 44
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 0c
Jun 14 14:38:14 vmhost2 c7
Jun 14 14:38:14 vmhost2 44
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 40
Jun 14 14:38:14 vmhost2 00
Jun 14 14:38:14 vmhost2 00
Jun 14 14:38:14 vmhost2 00
Jun 14 14:38:14 vmhost2 00
Jun 14 14:38:14 vmhost2 89
Jun 14 14:38:14 vmhost2 44
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 3c
Jun 14 14:38:14 vmhost2 e8
Jun 14 14:38:14 vmhost2 b8
Jun 14 14:38:14 vmhost2 1e
Jun 14 14:38:14 vmhost2 df
Jun 14 14:38:14 vmhost2 ff
Jun 14 14:38:14 vmhost2 85
Jun 14 14:38:14 vmhost2 c0
Jun 14 14:38:14 vmhost2 0f
Jun 14 14:38:14 vmhost2 84
Jun 14 14:38:14 vmhost2 2c
Jun 14 14:38:14 vmhost2 ff
Jun 14 14:38:14 vmhost2 ff
Jun 14 14:38:14 vmhost2 ff
Jun 14 12:38:13 vmhost2 unparseable log message: "<0f> "
Jun 14 14:38:14 vmhost2 0b
Jun 14 14:38:14 vmhost2 eb
Jun 14 14:38:14 vmhost2 fe
Jun 14 14:38:14 vmhost2 0f
Jun 14 14:38:14 vmhost2 0b
Jun 14 14:38:14 vmhost2 eb
Jun 14 14:38:14 vmhost2 fe
Jun 14 14:38:14 vmhost2 0f
Jun 14 14:38:14 vmhost2 0b
Jun 14 14:38:14 vmhost2 eb
Jun 14 14:38:14 vmhost2 fe
Jun 14 14:38:14 vmhost2 0f
Jun 14 14:38:14 vmhost2 0b
Jun 14 14:38:14 vmhost2 eb
Jun 14 14:38:14 vmhost2 fe
Jun 14 14:38:14 vmhost2 8b
Jun 14 14:38:14 vmhost2 54
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 04
Jun 14 14:38:14 vmhost2 8b
Jun 14 14:38:14 vmhost2 44
Jun 14 14:38:14 vmhost2 24
Jun 14 14:38:14 vmhost2 0c
Jun 14 14:38:14 vmhost2 e8
Jun 14 14:38:14 vmhost2
Jun 14 14:38:14 vmhost2 [  201.643843] EIP: [<c120f170>]
Jun 14 14:38:14 vmhost2 gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:38:14 vmhost2 SS:ESP 0069:c153be50
Jun 14 14:38:14 vmhost2 [  201.644028] ---[ end trace af6399fb7ba91a18 ]---
Jun 14 14:38:14 vmhost2 [  201.644098] Kernel panic - not syncing: Fatal 
exception in interrupt
Jun 14 14:38:14 vmhost2 [  201.644173] Pid: 0, comm: swapper Tainted: G      D  
  2.6.32.15-xen4.0.0-dom0-stefano #2
Jun 14 14:38:14 vmhost2 [  201.644265] Call Trace:
Jun 14 14:38:14 vmhost2 [  201.644336]  [<c141d3b5>] ? panic+0x42/0xe1
Jun 14 14:38:14 vmhost2 [  201.644408]  [<c100cc56>] ? oops_end+0x96/0xa0
Jun 14 14:38:14 vmhost2 [  201.644481]  [<c100a73f>] ? do_invalid_op+0x7f/0x90
Jun 14 14:38:14 vmhost2 [  201.644555]  [<c120f170>] ? 
gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:38:14 vmhost2 [  201.644632]  [<c13de9b0>] ? 
br_nf_pre_routing_finish+0x0/0x310
Jun 14 14:38:14 vmhost2 [  201.644709]  [<c137ae82>] ? nf_hook_slow+0x62/0xe0
Jun 14 14:38:14 vmhost2 [  201.644784]  [<c10741e4>] ? 
__alloc_pages_nodemask+0xe4/0x5b0
Jun 14 14:38:14 vmhost2 [  201.644860]  [<c106271d>] ? 
handle_IRQ_event+0x5d/0xc0
Jun 14 14:38:14 vmhost2 [  201.644935]  [<c141faa6>] ? error_code+0x66/0x6c
Jun 14 14:38:14 vmhost2 [  201.645009]  [<c137007b>] ? dev_graft_qdisc+0x5b/0x70
Jun 14 14:38:14 vmhost2 [  201.645083]  [<c100a6c0>] ? do_invalid_op+0x0/0x90
Jun 14 14:38:14 vmhost2 [  201.645157]  [<c120f170>] ? 
gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:38:14 vmhost2 [  201.645234]  [<c12200c1>] ? net_tx_action+0x1d1/0x9b0
Jun 14 14:38:14 vmhost2 [  201.645308]  [<c135e4e0>] ? process_backlog+0x90/0xa0
Jun 14 14:38:14 vmhost2 [  201.645382]  [<c103bc2e>] ? tasklet_action+0x9e/0xb0
Jun 14 14:38:14 vmhost2 [  201.645455]  [<c103c378>] ? __do_softirq+0x88/0x110
Jun 14 14:38:14 vmhost2 [  201.645529]  [<c1210057>] ? 
__xen_evtchn_do_upcall+0xd7/0x160
Jun 14 14:38:14 vmhost2 [  201.645604]  [<c103c43d>] ? do_softirq+0x3d/0x40
Jun 14 14:38:14 vmhost2 [  201.645677]  [<c121063a>] ? 
xen_evtchn_do_upcall+0x2a/0x40
Jun 14 14:38:14 vmhost2 [  201.645754]  [<c1009da7>] ? xen_do_upcall+0x7/0xc
Jun 14 14:38:14 vmhost2 [  201.645830]  [<c10013a7>] ? 
hypercall_page+0x3a7/0x1010
Jun 14 14:38:14 vmhost2 [  201.645904]  [<c10061ef>] ? xen_safe_halt+0xf/0x20
Jun 14 14:38:14 vmhost2 [  201.645989]  [<c100382c>] ? xen_idle+0x1c/0x30
Jun 14 14:38:14 vmhost2 [  201.646063]  [<c10081fa>] ? cpu_idle+0x3a/0x60
Jun 14 14:38:14 vmhost2 [  201.646139]  [<c15787ef>] ? start_kernel+0x2c6/0x2cb
Jun 14 14:38:14 vmhost2 [  201.646213]  [<c1578367>] ? 
unknown_bootoption+0x0/0x190
Jun 14 14:38:14 vmhost2 [  201.646288]  [<c157b0e6>] ? 
xen_start_kernel+0x624/0x62c



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.