[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Debugging frozen VM problem



On Fri, 2015-09-04 at 10:57 +0100, Keith Roberts wrote:
> Hi all.
> 
> I recently updated a box from openSUSE 12.3 to openSUSE 13.1 evergreen,


You might find it beneficial to ask on an openSUSE list or forum or
whatever.

> and after the OS upgrade the VMâs ran for about 5-10 minutes, then they
> all started to freeze and lock up. So I had to roll back to 12.3 using a
> fresh set of VMâs and a clonezilla image of the 12.3 OS.
> 
> On suse 12.3  I was using the xm toolstack to manage the PV VMâs.
> 
> However this is deprecated in suse 13.1 so I opted to use the libvirt
> toolstack, which seems to have the best future.
> 
> I imported the VM domains into the libvirt toolstack with:
> 
> # virsh define vm-domain-file-1.xml
> 
> # virsh define vm-domain-file-nn.xml

Is libvirt using xend or libxl as the underlying toolstack in your
configuration?

If libvirtd is using libxl then you _must_ stop the xend daemon altogether
since they do not play nice together (although the bugs do not look like
the log messages you have here IIRC).

If libvirt is using xend then you obviously need xend running, but I have
no idea how the domains from libvirt vs direct ones will interact.

> The only thing I can think of that might be causing this is that I did
> not remove the domains from Xend management first with:
> 
> ( delete  -  Remove a domain from Xend domain management.)
> # xm delete vm-domain-file-n.xml
> 
> before defining them for libvirt and virsh to manage using the âvirsh
> define vm-domain-file-1.xmlâ command.
> 
> So is it possible that xend got confused with having two slightly
> different domain definitions for each VM - one current definition in Xend
> and another definition based on earlier domain dumps imported into the
> libvirt toolstack?
> 
> -------------------------------------------------------------------------
> -------------------------------------------------
> 
> Hereâs an example of /var/log/messages from the Dom-0 VM host server:
> 
> [ 3929.511206] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.520454] end_request: I/O error, dev tapdevh, sector 21018928
> [ 3929.529812] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.539240] end_request: I/O error, dev tapdevh, sector 21019016
> [ 3929.539250] end_request: I/O error, dev tapdevh, sector 21020040
> [ 3929.539272] end_request: I/O error, dev tapdevh, sector 21020128
> [ 3929.539290] end_request: I/O error, dev tapdevh, sector 21020216
> [ 3929.539307] end_request: I/O error, dev tapdevh, sector 21020304
> [ 3929.539325] end_request: I/O error, dev tapdevh, sector 21020392
> [ 3929.539346] end_request: I/O error, dev tapdevh, sector 21020480
> [ 3929.539365] end_request: I/O error, dev tapdevh, sector 21020568
> [ 3929.539387] end_request: I/O error, dev tapdevh, sector 21020656

These might not even be toolstack related, they are from tapdisk. Maybe
something broke with that in the upgrade? Or maybe the old and new
toolstacks choose different disk backends and the new one has chosen
tapdisk which was always buggy but you didn't notice?

> [ 3929.617328] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.617361] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.617393] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.617423] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.617456] blktap_device_fail_pending_requests: 252:7: failing pending 
> read of 11 pages
> [ 3929.617505] blktap_ring_vm_close: unmapping ring 7
> [ 3929.617611] blktap_ring_release: freeing device 7
> [ 3935.621510] blk_update_request: 187 callbacks suppressed
> [ 3935.625462] end_request: I/O error, dev tapdevh, sector 9008448
> [ 3935.633599] end_request: I/O error, dev tapdevh, sector 9008456
> [ 3935.639492] end_request: I/O error, dev tapdevh, sector 9008464
> [ 3935.639510] end_request: I/O error, dev tapdevh, sector 9008472
> [ 3935.639527] end_request: I/O error, dev tapdevh, sector 9008480
> [ 3935.639542] end_request: I/O error, dev tapdevh, sector 9008488
> [ 3935.639562] end_request: I/O error, dev tapdevh, sector 9008496
> [ 3935.639581] end_request: I/O error, dev tapdevh, sector 9008504
> [ 3935.639600] end_request: I/O error, dev tapdevh, sector 9008512
> [ 3935.640422] end_request: I/O error, dev tapdevh, sector 8978432
> [ 5007.131956] blk_update_request: 2 callbacks suppressed
> [ 5007.132007] end_request: I/O error, dev tapdevk, sector 16648320
> [ 5007.140023] end_request: I/O error, dev tapdevk, sector 2439096
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648408
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648496
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648584
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648672
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648760
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648848
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16648936
> [ 5007.144037] end_request: I/O error, dev tapdevk, sector 16649024
> [ 5007.181651] blktap_ring_vm_close: unmapping ring 10
> [ 5007.185530] blktap_ring_release: freeing device 10
> [ 5007.185929] br0: port 8(vif441.0) entered disabled state
> [ 5007.186101] blktap_device_destroy: destroy device 10 users 0
> [ 5007.196497] device vif441.0 left promiscuous mode
> [ 5007.196501] br0: port 8(vif441.0) entered disabled state
> [ 5016.447258] blktap_control_allocate_tap: allocated tap
> ffff88015b948000
> [ 5016.458638] blktap_ring_open: opening device blktap13
> [ 5016.462258] blktap_ring_open: opened device 13
> [ 5016.465702] blktap_ring_mmap: blktap: mapping pid is 16209
> [ 5016.469281] blktap_validate_params: aio:/var/lib/xen/images/cpp
> -main/xvda: capacity: 20971520, sector-size: 512
> [ 5016.473065] blktap_validate_params: aio:/var/lib/xen/images/cpp
> -main/xvda: capacity: 20971520, sector-size: 512
> [ 5016.476845] blktap_device_create: minor 13 sectors 20971520 sector
> -size 512
> [ 5016.481334] blktap_device_create: creation of 252:13: 0
> [ 5016.753935] device vif441.0 entered promiscuous mode
> [ 5016.760163] br0: port 8(vif441.0) entered forwarding state
> [ 5016.760793] br0: port 8(vif441.0) entered forwarding state
> [ 5018.234858] blkback: event-channel 9
> [ 5018.239301] blkback: protocol 1 (x86_64-abi)
> [ 5018.243653] blkback: ring-ref 8
> 
> -------------------------------------------------------------------------
> -------------------------------------------------
> 
> Whatâs the best way to setup a test server to try and replicate this
> issue and log whatâs happening, so I can work out whatâs causing it
> please?
> 
> Kind Regards,
> 
> Keith Roberts
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxx
> http://lists.xen.org/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.