[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg



Hi

Are you actually using the "xen/next" branch?  I recommend you use
xen/stable-2.6.32.x, since that's tracking all the other bugfixes going
into Linux 2.6.32.
I was using xen/next since some of the features i use were not
in xen/stable at the time. I built a new xen/stable-2.6.32.x yesterday,
which does seem to work fine, so i guess i can follow that branch
now.

To keep consistency with old recording data, and since i would like to
have all
recordings in a single volume, i tried to use an nfs mount of the
recordings volume
from the dom0 to mount on all backends. This resulted in a very
unstable system,
to the point where my most important slave backend became unusable.
Unstable how?
The mythtv backends would not be able to reliably record shows on an
nfs mounted filesystem. Ivtv driver would complain about application not
reading fast enough. This made the backends unusable.

That appears to mean that you're getting single packets which are larger
than 18 pages long (72k).  I'm not quite sure how that's possible, since
I thought the datagram limit is 64k..

Are you using nfs over udp or tcp?  (I think tcp, from your stack trace.)

Does turning of tso/gso with ethtool make a difference?
Ok, i tried this on the running system, and it did seem to improve
things, but still i'd see some (other) messages.
After a reboot, with the new xen/stable-2.6.32.13.x based kernel
and switching tso and gso off with ethtool, these messages are
now completely gone (have the system up for about a day now).

I do notice something else though (might have been there before,
but now it is the only message in domU dmesg), just after starting
nfs during boot of the domU:

BUG: unable to handle kernel paging request at 00000002dcf32198
IP: [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
PGD a777067 PUD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci-0/pci0000:08/0000:08:02.0/local_cpus
CPU 0
Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss autofs4 ipv6 wm8775 tea5767 cx25840 tuner_simple sunrpc tuner_types tda9887 tda8290 tuner msp3400 saa7127 saa7115 ivtv i2c_algo_bit cx2341x v4l2_common videodev v4l1_compat xen_fbfront v4l2_compat_ioctl32 fb_sys_fops tveeprom sysimgblt joydev i2c_core sysfillrect xen_kbdfront syscopyarea xen_netfront raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear
Pid: 3468, comm: irqbalance Not tainted 2.6.32.13m7.1 #1
RIP: e030:[<ffffffff811cf09a>] [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
RSP: e02b:ffff88001cbd9e18  EFLAGS: 00010246
RAX: ffffffff81527f2b RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000ffe RDI: 0000000000000000
RBP: ffff88001cbd9e48 R08: 0000000000000010 R09: 0000000000000001
R10: 0000000000000357 R11: dead000000200200 R12: 0000000000000000
R13: 0000000000000ffe R14: 00000002dcf32198 R15: ffff880002bbd000
FS:  00007fc142b6d720(0000) GS:ffff8800046e0000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000002dcf32198 CR3: 000000001ca58000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process irqbalance (pid: 3468, threadinfo ffff88001cbd8000, task ffff88001ded2920)
Stack:
 0000000000000200 ffff880002bbd000 ffff88001cbd9f58 ffff880002eeb858
<0> ffff88001ce8ed10 ffffffff81616230 ffff88001cbd9e68 ffffffff811dd333
<0> ffff880002eeb878 ffffffff81606368 ffff88001cbd9e98 ffffffff81273574
Call Trace:
 [<ffffffff811dd333>] local_cpus_show+0x44/0x57
 [<ffffffff81273574>] dev_attr_show+0x22/0x49
 [<ffffffff810a4e8e>] ? __get_free_pages+0x9/0x46
 [<ffffffff8112fbc2>] sysfs_read_file+0xb4/0x139
 [<ffffffff810da927>] vfs_read+0xa6/0x103
 [<ffffffff810daa3a>] sys_read+0x45/0x69
 [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b
Code: e0 48 c7 c0 2b 7f 52 81 41 83 ec 20 31 db eb 60 44 89 e2 44 89 e1 48 63 fb 83 e1 3f c1 fa 06 41 b9 01 00 00 00 48 63 d2 44 89 ee <49> 8b 14 d6 29 de 48 d3 ea 49 8d 3c 3f 44 88 c1 41 83 ec 20 49
RIP  [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
 RSP <ffff88001cbd9e18>
CR2: 00000002dcf32198
---[ end trace 5f520ed1e48e5394 ]---


During boot of dom0 i see the following when it is starting my domU (seems to be more of a warning):
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
Pid: 5861, comm: qemu-dm Not tainted 2.6.32.13m7.1 #1
Call Trace:
 [<ffffffff8106a625>] __lock_acquire+0x431/0x459
 [<ffffffff810b029d>] ? vma_prio_tree_remove+0x27/0xda
 [<ffffffff8106a6b1>] lock_acquire+0x64/0x81
 [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
 [<ffffffff813cdb70>] _spin_lock_nest_lock+0x31/0x66
 [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
 [<ffffffff813ccc0e>] ? mutex_lock_nested+0x34/0x39
 [<ffffffff810b939d>] mm_take_all_locks+0xe5/0x11c
 [<ffffffff810cbcbc>] ? do_mmu_notifier_register+0x56/0x113
 [<ffffffff810cbcc4>] do_mmu_notifier_register+0x5e/0x113
 [<ffffffff810cbd94>] mmu_notifier_register+0xe/0x10
 [<ffffffff8123acdb>] gntdev_open+0x8f/0xcc
 [<ffffffff81257dc2>] misc_open+0x188/0x21e
 [<ffffffff810dd1f6>] chrdev_open+0x164/0x185
 [<ffffffff810dd092>] ? chrdev_open+0x0/0x185
 [<ffffffff810d8bd5>] __dentry_open+0x149/0x27f
 [<ffffffff810d8dd1>] nameidata_to_filp+0x3d/0x4e
 [<ffffffff810e59ed>] do_filp_open+0x4ee/0x9e9
 [<ffffffff8100e871>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100eff2>] ? check_events+0x12/0x20
 [<ffffffff811d0637>] ? _raw_spin_unlock+0x8f/0x98
 [<ffffffff813cdb3a>] ? _spin_unlock+0x26/0x2b
 [<ffffffff810eedf2>] ? alloc_fd+0x111/0x123
 [<ffffffff810d89a3>] do_sys_open+0x5e/0x10a
 [<ffffffff810d8a78>] sys_open+0x1b/0x1d
 [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b


Probably not related, i see the following message in my dom0 from time to time, and if it appears at the 'wrong' moment, it causes my system to become completely unusable as soon as a process needs disk access.

ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata4.00: BMDMA stat 0x64
ata4.00: failed command: READ DMA
ata4.00: cmd c8/00:08:99:13:5c/00:00:00:00:00/ef tag 0 dma 4096 in
         res 51/40:00:a0:13:5c/00:00:00:00:00/ef Emask 0x9 (media error)
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4.00: configured for UDMA/133
ata4.01: configured for UDMA/133
ata4: EH complete

Not sure if this is related though, it could be just a bad disk (it seems to be always related to the same disk), i'm going to replace the disk, and see if that makes a difference.


Regards,
Mark


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.