[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV HVM Linux drivers fail



On Wed, 28 Jul 2010, Leo wrote:
> Hi,
> 
> I don't see anyone having similar issues when running domU kernel in HVM.  
> Hopefully it's just some very obvious stuff I
> missed.
> 
> I am using the 2.6.34-pvhvm-v6 branch from 
> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.  I compiled the kernel
> with just about everything as modules, including platform-pci, xen-blkfront 
> and xen-netfront.
> 
> When I use the kernel parameter xen_emul_unplug=ignore, both ata_piix and 
> 8139cp drivers attach to the emulated devices
> and work just fine.
> 
> When I unplug either the nic or the ide-disk, the loading order of the 3 
> modules makes some difference.  If I load
> platform-pci first before either of the frontend drivers, I get XENBUS 
> Timeout.  Note that loading the platform-pci module
> doesn't cause any error, it is loading the frontend drivers when I see this 
> error: (This happens when I load the
> xen-netfront driver.  platform-pci is loaded at 58s.  I also trimmed the wait 
> time from 300s to 30s)
> 
> [  102.570177] Initialising Xen virtual ethernet driver.
> [  107.676025] XENBUS: Waiting for devices to initialise: 
> 25s...20s...15s...10s...5s...0s...
> [  132.677038] XENBUS: Timeout connecting to device: device/vif/0 (local 
> state 1, remote state 2)
> 
> So I have to load the frontend drivers before platform-pci, and none of this 
> error like this occured.  And because of that
> I cannot compile in all the three modules as they'll always be waiting to 
> connect.
> 

You are right about the module loading order: you need to load the PV
driver modules before platform-pci otherwise it won't find any xenbus
devices.
But you should be able to compile-in platform-pci, blkfront and netfront
and everything should work fine.


> However I get some other error.  When I unplug ide-disks, I get this:
> 
> <4>[   29.842072] ------------[ cut here ]------------
> <4>[   29.842785] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xac/0xc0()
> <4>[   29.843434] Hardware name: HVM domU
> <4>[   29.844070] sysfs: cannot create duplicate filename '/block/xvda'
> <4>[   29.844713] Modules linked in: 8139cp mii platform_pci xen_netfront 
> xen_blkfront
> <4>[   29.847272] Pid: 20, comm: xenwatch Not tainted 2.6.34-pvhvm #10
> <4>[   29.847945] Call Trace:
> <4>[   29.856027]  [<ffffffff81146ebc>] ? sysfs_add_one+0xac/0xc0
> <4>[   29.856722]  [<ffffffff810424fa>] warn_slowpath_common+0x77/0x8f
> <4>[   29.857372]  [<ffffffff81042587>] warn_slowpath_fmt+0x64/0x66
> <4>[   29.858015]  [<ffffffff8106e6c8>] ? do_raw_spin_unlock+0x17/0x1b
> <4>[   29.858722]  [<ffffffff81146e32>] ? sysfs_add_one+0x22/0xc0
> <4>[   29.859358]  [<ffffffff812b3b94>] ? __mutex_lock_common+0x229/0x243
> <4>[   29.860034]  [<ffffffff810e433d>] ? __raw_local_irq_save+0x22/0x28
> <4>[   29.860815]  [<ffffffff81146d7a>] ? sysfs_pathname+0x37/0x3f
> <4>[   29.861461]  [<ffffffff81146d7a>] ? sysfs_pathname+0x37/0x3f
> <4>[   29.862126]  [<ffffffff81146ebc>] sysfs_add_one+0xac/0xc0
> <4>[   29.862768]  [<ffffffff811475f1>] create_dir+0x58/0x87
> <4>[   29.863393]  [<ffffffff810e433d>] ? __raw_local_irq_save+0x22/0x28
> <4>[   29.864092]  [<ffffffff81147658>] sysfs_create_dir+0x38/0x4f
> <4>[   29.864833]  [<ffffffff812b49f8>] ? _raw_spin_unlock+0x2d/0x38
> <4>[   29.865525]  [<ffffffff8119aa6d>] kobject_add_internal+0xdb/0x19b
> <4>[   29.866186]  [<ffffffff8119ac05>] kobject_add_varg+0x41/0x4e
> <4>[   29.866858]  [<ffffffff8119b0d0>] kobject_add+0x89/0x8b
> <4>[   29.867590]  [<ffffffff812098b6>] ? kzalloc+0xf/0x11
> <4>[   29.868280]  [<ffffffff812098b6>] ? kzalloc+0xf/0x11
> <4>[   29.868936]  [<ffffffff8119405c>] ? exact_lock+0x0/0x14
> <4>[   29.869574]  [<ffffffff810e71d8>] ? __kmalloc+0x106/0x118
> <4>[   29.870232]  [<ffffffff812098b6>] ? kzalloc+0xf/0x11
> <4>[   29.870866]  [<ffffffff8119a98a>] ? kobject_get+0x1a/0x22
> <4>[   29.871510]  [<ffffffff81209ea0>] ? get_device+0x14/0x1a
> <4>[   29.872209]  [<ffffffff8120a451>] device_add+0xdc/0x60c
> <4>[   29.872866]  [<ffffffff8120ee8e>] ? kobj_map+0x68/0x12a
> <4>[   29.873506]  [<ffffffff8114082b>] register_disk+0x3c/0x11d
> <4>[   29.874240]  [<ffffffff81194db7>] add_disk+0xb8/0x119
> <4>[   29.874922]  [<ffffffffa000125b>] backend_changed+0x441/0x45b 
> [xen_blkfront]
> <4>[   29.875604]  [<ffffffff811eb6d2>] ? xenwatch_thread+0x0/0x152
> <4>[   29.876245]  [<ffffffff811ec5ac>] otherend_changed+0x82/0x8b
> <4>[   29.876864]  [<ffffffff811eb7ea>] xenwatch_thread+0x118/0x152
> <4>[   29.877481]  [<ffffffff8105c560>] ? autoremove_wake_function+0x0/0x38
> <4>[   29.878134]  [<ffffffff811eb6d2>] ? xenwatch_thread+0x0/0x152
> <4>[   29.878886]  [<ffffffff8105c35e>] kthread+0x69/0x71
> <4>[   29.879517]  [<ffffffff8100aa04>] kernel_thread_helper+0x4/0x10
> <4>[   29.880198]  [<ffffffff8105c2f5>] ? kthread+0x0/0x71
> <4>[   29.880923]  [<ffffffff8100aa00>] ? kernel_thread_helper+0x0/0x10
> <4>[   29.908017] ---[ end trace 3f99b54d0b8663be ]---
> <3>[   29.909904] kobject_add_internal failed for xvda with -EEXIST, don't 
> try to register things with the same name in
> the same directory.
> <4>[   29.911103] Pid: 20, comm: xenwatch Tainted: G        W  2.6.34-pvhvm 
> #10
> <4>[   29.916378] Call Trace:
> <4>[   29.917001]  [<ffffffff8119a861>] ? kobject_put+0x47/0x4b
> <4>[   29.917682]  [<ffffffff8119aaeb>] kobject_add_internal+0x159/0x19b
> <4>[   29.918389]  [<ffffffff8119ac05>] kobject_add_varg+0x41/0x4e
> <4>[   29.919124]  [<ffffffff8119b0d0>] kobject_add+0x89/0x8b
> <4>[   29.919834]  [<ffffffff812098b6>] ? kzalloc+0xf/0x11
> <4>[   29.920474]  [<ffffffff812098b6>] ? kzalloc+0xf/0x11
> <4>[   29.921191]  [<ffffffff8119405c>] ? exact_lock+0x0/0x14
> <4>[   29.921955]  [<ffffffff810e71d8>] ? __kmalloc+0x106/0x118
> <4>[   29.922690]  [<ffffffff812098b6>] ? kzalloc+0xf/0x11
> <4>[   29.923578]  [<ffffffff8119a98a>] ? kobject_get+0x1a/0x22
> <4>[   29.924324]  [<ffffffff81209ea0>] ? get_device+0x14/0x1a
> <4>[   29.925178]  [<ffffffff8120a451>] device_add+0xdc/0x60c
> <4>[   29.925873]  [<ffffffff8120ee8e>] ? kobj_map+0x68/0x12a
> <4>[   29.926567]  [<ffffffff8114082b>] register_disk+0x3c/0x11d
> <4>[   29.927195]  [<ffffffff81194db7>] add_disk+0xb8/0x119
> <4>[   29.927821]  [<ffffffffa000125b>] backend_changed+0x441/0x45b 
> [xen_blkfront]
> <4>[   29.928566]  [<ffffffff811eb6d2>] ? xenwatch_thread+0x0/0x152
> <4>[   29.929211]  [<ffffffff811ec5ac>] otherend_changed+0x82/0x8b
> <4>[   29.929855]  [<ffffffff811eb7ea>] xenwatch_thread+0x118/0x152
> <4>[   29.930490]  [<ffffffff8105c560>] ? autoremove_wake_function+0x0/0x38
> <4>[   29.931151]  [<ffffffff811eb6d2>] ? xenwatch_thread+0x0/0x152
> <4>[   29.931787]  [<ffffffff8105c35e>] kthread+0x69/0x71
> <4>[   29.932550]  [<ffffffff8100aa04>] kernel_thread_helper+0x4/0x10
> <4>[   29.933201]  [<ffffffff8105c2f5>] ? kthread+0x0/0x71
> <4>[   29.933834]  [<ffffffff8100aa00>] ? kernel_thread_helper+0x0/0x10
> <5>[   29.966562] SCSI subsystem initialized
> <0>[   29.984037] ------------[ cut here ]------------
> <2>[   29.984721] kernel BUG at fs/sysfs/group.c:65!
> <0>[   29.985357] invalid opcode: 0000 [#1] PREEMPT SMP
> <0>[   29.987151] last sysfs file: /sys/class/firmware/timeout
> <4>[   29.987978] CPU 0
> <4>[   29.988014] Modules linked in: sd_mod scsi_mod ext3 jbd mbcache 8139cp 
> mii platform_pci xen_netfront xen_blkfront
> <4>[   29.988014]
> <4>[   29.988014] Pid: 20, comm: xenwatch Tainted: G        W  2.6.34-pvhvm 
> #10 /HVM domU
> <4>[   29.988014] RIP: 0010:[<ffffffff8114883a>]  [<ffffffff8114883a>] 
> internal_create_group+0x2a/0x127
> <4>[   29.988014] RSP: 0018:ffff88001f97fd00  EFLAGS: 00010246
> <4>[   29.988014] RAX: 00000000ffffffef RBX: ffff88001e200870 RCX: 
> ffffffff8120a927
> <4>[   29.988014] RDX: ffffffff81640280 RSI: 0000000000000000 RDI: 
> ffff88001e200870
> <4>[   29.988014] RBP: ffff88001f97fd40 R08: ffff88001f97e000 R09: 
> ffff880001a14c20
> <4>[   29.988014] R10: ffff88001f97fca0 R11: 0000000000000000 R12: 
> ffff88001e0e88a8
> <4>[   29.988014] R13: 000000000000000f R14: ffff88001e200860 R15: 
> ffffffff81640280
> <4>[   29.988014] FS:  0000000000000000(0000) GS:ffff880001a00000(0000) 
> knlGS:0000000000000000
> <4>[   29.988014] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> <4>[   29.988014] CR2: 00007f151c82200f CR3: 000000001e7bc000 CR4: 
> 00000000000006b0
> <4>[   29.988014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> <4>[   29.988014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> <4>[   29.988014] Process xenwatch (pid: 20, threadinfo ffff88001f97e000, 
> task ffff88001f975b80)
> <0>[   29.988014] Stack:
> <4>[   29.988014]  0000000000000000 0000000089700e79 ffff88001f97fd70 
> ffff88001e200800
> <4>[   29.988014] <0> ffff88001e0e88a8 000000000000000f ffff88001e200860 
> 0000000000000000
> <4>[   29.988014] <0> ffff88001f97fd50 ffffffff81148958 ffff88001f97fd60 
> ffffffff810a39fe
> <0>[   29.988014] Call Trace:
> <4>[   29.988014]  [<ffffffff81148958>] sysfs_create_group+0xe/0x12
> <4>[   29.988014]  [<ffffffff810a39fe>] blk_trace_init_sysfs+0x14/0x16
> <4>[   29.988014]  [<ffffffff8118ff11>] blk_register_queue+0x42/0xc8
> <4>[   29.988014]  [<ffffffff81194dbf>] add_disk+0xc0/0x119
> <4>[   29.988014]  [<ffffffffa000125b>] backend_changed+0x441/0x45b 
> [xen_blkfront]
> <4>[   29.988014]  [<ffffffff811eb6d2>] ? xenwatch_thread+0x0/0x152
> <4>[   29.988014]  [<ffffffff811ec5ac>] otherend_changed+0x82/0x8b
> <4>[   29.988014]  [<ffffffff811eb7ea>] xenwatch_thread+0x118/0x152
> <4>[   29.988014]  [<ffffffff8105c560>] ? autoremove_wake_function+0x0/0x38
> <4>[   29.988014]  [<ffffffff811eb6d2>] ? xenwatch_thread+0x0/0x152
> <4>[   29.988014]  [<ffffffff8105c35e>] kthread+0x69/0x71
> <4>[   29.988014]  [<ffffffff8100aa04>] kernel_thread_helper+0x4/0x10
> <4>[   29.988014]  [<ffffffff8105c2f5>] ? kthread+0x0/0x71
> <4>[   29.988014]  [<ffffffff8100aa00>] ? kernel_thread_helper+0x0/0x10
> <0>[   29.988014] Code: c3 55 48 89 e5 41 57 49 89 d7 41 56 41 55 41 54 53 48 
> 89 fb 48 83 ec 18 48 85 ff 89 75 c0 74 0b 85
> f6 75 11 48 83 7f 30 00 75 04 <0f> 0b eb fe 83 7d c0 00 74 11 48 83 7b 30 00 
> 41 bd ea ff ff ff
> <1>[   29.988014] RIP  [<ffffffff8114883a>] internal_create_group+0x2a/0x127
> <4>[   29.988014]  RSP <ffff88001f97fd00>
> <4>[   30.132020] ---[ end trace 3f99b54d0b8663bf ]---
> 
> Then when I do an fdisk on block device 202:0, I get this:
> 
> <4>[  223.840262] ------------[ cut here ]------------
> <4>[  223.841061] WARNING: at fs/fs-writeback.c:1105 
> __mark_inode_dirty+0xed/0x130()
> <4>[  223.842008] Hardware name: HVM domU
> <4>[  223.842585] Modules linked in: ata_piix libata sd_mod scsi_mod ext3 jbd 
> mbcache 8139cp mii platform_pci xen_netfront
> xen_blkfront
> <4>[  223.847533] Pid: 875, comm: busybox Tainted: G      D W  2.6.34-pvhvm 
> #10
> <4>[  223.848259] Call Trace:
> <4>[  223.848994]  [<ffffffff8110e33a>] ? __mark_inode_dirty+0xed/0x130
> <4>[  223.849761]  [<ffffffff810424fa>] warn_slowpath_common+0x77/0x8f
> <4>[  223.850651]  [<ffffffff81042521>] warn_slowpath_null+0xf/0x11
> <4>[  223.851509]  [<ffffffff8110e33a>] __mark_inode_dirty+0xed/0x130
> <4>[  223.852195]  [<ffffffff810b55ac>] ? sync_page_killable+0x0/0x3e
> <4>[  223.853449]  [<ffffffff811041cd>] mark_inode_dirty_sync+0xe/0x10
> <4>[  223.854260]  [<ffffffff81105554>] touch_atime+0xfb/0x120
> <4>[  223.855144]  [<ffffffff810b41a6>] file_accessed+0x17/0x19
> <4>[  223.855956]  [<ffffffff810b5ac4>] generic_file_aio_read+0x4da/0x53e
> <4>[  223.856686]  [<ffffffff810f27c7>] do_sync_read+0xc2/0x106
> <4>[  223.857307]  [<ffffffff810e433d>] ? __raw_local_irq_save+0x22/0x28
> <4>[  223.858023]  [<ffffffff8115aca4>] ? security_file_permission+0x11/0x13
> <4>[  223.858993]  [<ffffffff810f2e9d>] vfs_read+0xa8/0x102
> <4>[  223.859658]  [<ffffffff810f313f>] sys_read+0x47/0x6d
> <4>[  223.860379]  [<ffffffff81009c02>] system_call_fastpath+0x16/0x1b
> <4>[  223.861296] ---[ end trace 3f99b54d0b8663c0 ]---
> <3>[  223.861934] bdi-block not registered
> 
> Reading from the block device returns some random data:  (here I am doing an 
> fdisk on the block device 202:0), and it's
> returning the wrong disk size.  The virtual disk is 15GiB.
> 
> Disk /dev/xvda: 4393 MB, 4393723904 bytes
> 255 heads, 63 sectors/track, 534 cylinders, total 8581492 sectors
> Units = sectors of 1 * 512 = 512 bytes
> 
> Disk /dev/xvda doesn't contain a valid partition table
> 
> 
> If I unplug the nics, there is no error,
> 
> [   29.738321] Initialising Xen virtual ethernet driver.
> [   29.748844]   alloc irq_desc for 28 on node -1
> [   29.748847]   alloc kstat_irqs on node -1
> [   29.748866] xen-platform-pci 0000:00:03.0: PCI INT A -> GSI 28 (level, 
> low) -> IRQ 28
> [   29.750216] Grant table initialized
> [   29.753307] vif vif-0: 2 parsing device/vif/0/mac
> 
> 8130cp, as expected, is not attached to eth0.  However, it seems xen-netfront 
> cannot get the MAC address of eth0:
> 
> eth0      Link encap:Ethernet  HWaddr 00:00:00:00:00:00 
>           BROADCAST MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
> 
> Is it something very obvious that I missed?
 
this is an actual bug, I have a fix for it in a new branch (that we are
trying to merge with linux-next as we speak):

2.6.35-rc5-pvhvm-v6

the problem is due to a bug in the way blkfront names devices that have
major numbers different from XENVBD.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.