[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xennet: skb rides the rocket messages in domU dmesg
Hi Are you actually using the "xen/next" branch? I recommend you use xen/stable-2.6.32.x, since that's tracking all the other bugfixes going into Linux 2.6.32. I was using xen/next since some of the features i use were not in xen/stable at the time. I built a new xen/stable-2.6.32.x yesterday, which does seem to work fine, so i guess i can follow that branch now. To keep consistency with old recording data, and since i would like to have all recordings in a single volume, i tried to use an nfs mount of the recordings volume from the dom0 to mount on all backends. This resulted in a very unstable system, to the point where my most important slave backend became unusable.Unstable how? The mythtv backends would not be able to reliably record shows on an nfs mounted filesystem. Ivtv driver would complain about application not reading fast enough. This made the backends unusable. That appears to mean that you're getting single packets which are larger than 18 pages long (72k). I'm not quite sure how that's possible, since I thought the datagram limit is 64k.. Are you using nfs over udp or tcp? (I think tcp, from your stack trace.) Does turning of tso/gso with ethtool make a difference? Ok, i tried this on the running system, and it did seem to improve things, but still i'd see some (other) messages. After a reboot, with the new xen/stable-2.6.32.13.x based kernel and switching tso and gso off with ethtool, these messages are now completely gone (have the system up for about a day now). I do notice something else though (might have been there before, but now it is the only message in domU dmesg), just after starting nfs during boot of the domU: BUG: unable to handle kernel paging request at 00000002dcf32198 IP: [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6 PGD a777067 PUD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci-0/pci0000:08/0000:08:02.0/local_cpus CPU 0Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss autofs4 ipv6 wm8775 tea5767 cx25840 tuner_simple sunrpc tuner_types tda9887 tda8290 tuner msp3400 saa7127 saa7115 ivtv i2c_algo_bit cx2341x v4l2_common videodev v4l1_compat xen_fbfront v4l2_compat_ioctl32 fb_sys_fops tveeprom sysimgblt joydev i2c_core sysfillrect xen_kbdfront syscopyarea xen_netfront raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear Pid: 3468, comm: irqbalance Not tainted 2.6.32.13m7.1 #1RIP: e030:[<ffffffff811cf09a>] [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6 RSP: e02b:ffff88001cbd9e18 EFLAGS: 00010246 RAX: ffffffff81527f2b RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000ffe RDI: 0000000000000000 RBP: ffff88001cbd9e48 R08: 0000000000000010 R09: 0000000000000001 R10: 0000000000000357 R11: dead000000200200 R12: 0000000000000000 R13: 0000000000000ffe R14: 00000002dcf32198 R15: ffff880002bbd000 FS: 00007fc142b6d720(0000) GS:ffff8800046e0000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000002dcf32198 CR3: 000000001ca58000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400Process irqbalance (pid: 3468, threadinfo ffff88001cbd8000, task ffff88001ded2920) Stack: 0000000000000200 ffff880002bbd000 ffff88001cbd9f58 ffff880002eeb858 <0> ffff88001ce8ed10 ffffffff81616230 ffff88001cbd9e68 ffffffff811dd333 <0> ffff880002eeb878 ffffffff81606368 ffff88001cbd9e98 ffffffff81273574 Call Trace: [<ffffffff811dd333>] local_cpus_show+0x44/0x57 [<ffffffff81273574>] dev_attr_show+0x22/0x49 [<ffffffff810a4e8e>] ? __get_free_pages+0x9/0x46 [<ffffffff8112fbc2>] sysfs_read_file+0xb4/0x139 [<ffffffff810da927>] vfs_read+0xa6/0x103 [<ffffffff810daa3a>] sys_read+0x45/0x69 [<ffffffff81011b02>] system_call_fastpath+0x16/0x1bCode: e0 48 c7 c0 2b 7f 52 81 41 83 ec 20 31 db eb 60 44 89 e2 44 89 e1 48 63 fb 83 e1 3f c1 fa 06 41 b9 01 00 00 00 48 63 d2 44 89 ee <49> 8b 14 d6 29 de 48 d3 ea 49 8d 3c 3f 44 88 c1 41 83 ec 20 49 RIP [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6 RSP <ffff88001cbd9e18> CR2: 00000002dcf32198 ---[ end trace 5f520ed1e48e5394 ]---During boot of dom0 i see the following when it is starting my domU (seems to be more of a warning): BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. Pid: 5861, comm: qemu-dm Not tainted 2.6.32.13m7.1 #1 Call Trace: [<ffffffff8106a625>] __lock_acquire+0x431/0x459 [<ffffffff810b029d>] ? vma_prio_tree_remove+0x27/0xda [<ffffffff8106a6b1>] lock_acquire+0x64/0x81 [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c [<ffffffff813cdb70>] _spin_lock_nest_lock+0x31/0x66 [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c [<ffffffff813ccc0e>] ? mutex_lock_nested+0x34/0x39 [<ffffffff810b939d>] mm_take_all_locks+0xe5/0x11c [<ffffffff810cbcbc>] ? do_mmu_notifier_register+0x56/0x113 [<ffffffff810cbcc4>] do_mmu_notifier_register+0x5e/0x113 [<ffffffff810cbd94>] mmu_notifier_register+0xe/0x10 [<ffffffff8123acdb>] gntdev_open+0x8f/0xcc [<ffffffff81257dc2>] misc_open+0x188/0x21e [<ffffffff810dd1f6>] chrdev_open+0x164/0x185 [<ffffffff810dd092>] ? chrdev_open+0x0/0x185 [<ffffffff810d8bd5>] __dentry_open+0x149/0x27f [<ffffffff810d8dd1>] nameidata_to_filp+0x3d/0x4e [<ffffffff810e59ed>] do_filp_open+0x4ee/0x9e9 [<ffffffff8100e871>] ? xen_force_evtchn_callback+0xd/0xf [<ffffffff8100eff2>] ? check_events+0x12/0x20 [<ffffffff811d0637>] ? _raw_spin_unlock+0x8f/0x98 [<ffffffff813cdb3a>] ? _spin_unlock+0x26/0x2b [<ffffffff810eedf2>] ? alloc_fd+0x111/0x123 [<ffffffff810d89a3>] do_sys_open+0x5e/0x10a [<ffffffff810d8a78>] sys_open+0x1b/0x1d [<ffffffff81011b02>] system_call_fastpath+0x16/0x1bProbably not related, i see the following message in my dom0 from time to time, and if it appears at the 'wrong' moment, it causes my system to become completely unusable as soon as a process needs disk access. ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata4.00: BMDMA stat 0x64 ata4.00: failed command: READ DMA ata4.00: cmd c8/00:08:99:13:5c/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:a0:13:5c/00:00:00:00:00/ef Emask 0x9 (media error) ata4.00: status: { DRDY ERR } ata4.00: error: { UNC } ata4.00: configured for UDMA/133 ata4.01: configured for UDMA/133 ata4: EH completeNot sure if this is related though, it could be just a bad disk (it seems to be always related to the same disk), i'm going to replace the disk, and see if that makes a difference. Regards, Mark _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |