[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] blktap and file-backed qcow: crashes and bad performance?



No I think that we have a few weeks left in our league.   Debbie plays
2-3 nights a week, and is in 2 different leagues.

On Fri, 2006-08-11 at 16:59 +0200, Christoph Dwertmann wrote:
> Hi!
> 
> I'm running the latest Xen unstable x86_64 on a Dell Poweredge 1950
> Dual CPU Dual Core Xeon with 16GB RAM. I'm using file-backed sparse
> qcow images as root filesystems for the Xen guests. All qcow images
> are backed by the same image file (a 32bit Debian sid installation).
> The Xen disk config looks like this:
> 
> disk   = [ 'tap:qcow:/home/images/%s.%d.qcow,xvda1,w' % (vmname, vmid)]
> 
> Before that I use the qcow-create tool to create those qcow files.
> 
> I use grub to boot Xen like this:
> root    (hd0,0)
> kernel /boot/xen-3.0-unstable.gz com2=57600,8n1 console=com2
> dom0_mem=4097152 noreboot xenheap_megabytes=32
> module /boot/xen0-linux root=/dev/sda1 ro noapic console=tty0
> xencons=ttyS1 console=ttyS1
> module /boot/xen0-linux-initrd
> 
> My goal is to run 100+ Xen guests, but this seems impossible. I
> observe several things:
> 
> - after creating a few Xen guests (and even after shutting them down),
> my process list is cluttered with "tapdisk" processes that put full
> load on all 8 virtual CPUs on the dom0. The system gets unuseable.
> Killing the tapdisk processes also apparently destroys the qcow
> images.
> 
> - I (randomly?) get the messages "Error: (28, 'No space left on
> device')" or "Error: Device 0 (vif) could not be connected. Hotplug
> scripts not working." or even "Error: (12, 'Cannot allocate memory')"
> on domU creation. There is plenty of disk space and RAM available at
> that time. This mostly happens when creating more than 80 guests.
> 
> - the dom0 will sooner or later crash with a message like this:
> 
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at fs/aio.c:511
> invalid opcode: 0000 [1] SMP
> CPU 0
> Modules linked in: ipt_MASQUERADE iptable_nat ip_nat ip_conntrack
> nfnetlink ip_tables x_tables bridge dm_snapshot dm_mirror dm_mod
> usbhid ide_cd sers
> Pid: 46, comm: kblockd/0 Not tainted 2.6.16.13-xen-kasuari-dom0 #1
> RIP: e030:[<ffffffff8018f8ee>] <ffffffff8018f8ee>{__aio_put_req+39}
> RSP: e02b:ffffffff803a89c8  EFLAGS: 00010086
> RAX: 00000000ffffffff RBX: ffff8800f43d7a80 RCX: 00000000f3bdc000
> RDX: 0000000000001458 RSI: ffff8800f43d7a80 RDI: ffff8800f62d1c80
> RBP: ffff8800f62d1c80 R08: 6db6db6db6db6db7 R09: ffff88000193d000
> R10: 0000000000000000 R11: ffffffff80153e48 R12: ffff8800f62d1ce8
> R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000
> FS:  00002b9bf01bccb0(0000) GS:ffffffff80472000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> Process kblockd/0 (pid: 46, threadinfo ffff8800005e4000, task 
> ffff8800005c57e0)
> Stack: ffff8800f43d7a80 ffff8800f62d1c80 ffff8800f62d1ce8 ffffffff80190082
>        ffff880004e83d10 ffff8800f4db7400 0000000000000200 ffff8800f4db7714
>        ffff8800f4db7400 0000000000000001
> Call Trace: <IRQ> <ffffffff80190082>{aio_complete+297}
>        <ffffffff80195b0b>{finished_one_bio+159}
> <ffffffff80195be8>{dio_bio_complete+150}
>        <ffffffff80195d24>{dio_bio_end_aio+32}
> <ffffffff801cf1b7>{__end_that_request_first+328}
>        <ffffffff801d00ca>{blk_run_queue+50}
> <ffffffff8800524d>{:scsi_mod:scsi_end_request+40}
>        <ffffffff880054fe>{:scsi_mod:scsi_io_completion+525}
>        <ffffffff880741ce>{:sd_mod:sd_rw_intr+598}
> <ffffffff88005792>{:scsi_mod:scsi_device_unbusy+85}
>        <ffffffff801d1534>{blk_done_softirq+175}
> <ffffffff80132544>{__do_softirq+122}
>        <ffffffff8010bada>{call_softirq+30} <ffffffff8010d231>{do_softirq+73}
>        <ffffffff8010d626>{do_IRQ+65} <ffffffff8023bf5a>{evtchn_do_upcall+134}
>        <ffffffff801d8a66>{cfq_kick_queue+0}
> <ffffffff8010b60a>{do_hypervisor_callback+30} <EOI>
>        <ffffffff801d8a66>{cfq_kick_queue+0}
> <ffffffff8010722a>{hypercall_page+554}
>        <ffffffff8010722a>{hypercall_page+554} 
> <ffffffff801dac97>{kobject_get+18}
>        <ffffffff8023b7aa>{force_evtchn_callback+10}
> <ffffffff8800641d>{:scsi_mod:scsi_request_fn+935}
>        <ffffffff801d8adc>{cfq_kick_queue+118}
> <ffffffff8013d3e6>{run_workqueue+148}
>        <ffffffff8013db18>{worker_thread+0}
> <ffffffff80140abd>{keventd_create_kthread+0}
>        <ffffffff8013dc08>{worker_thread+240}
> <ffffffff80125cdb>{default_wake_function+0}
>        <ffffffff80140abd>{keventd_create_kthread+0}
> <ffffffff80140abd>{keventd_create_kthread+0}
>        <ffffffff80140d61>{kthread+212} <ffffffff8010b85e>{child_rip+8}
>        <ffffffff80140abd>{keventd_create_kthread+0}
> <ffffffff80140c8d>{kthread+0}
>        <ffffffff8010b856>{child_rip+0}
> 
> Code: 0f 0b 68 c3 9b 2f 80 c2 ff 01 85 c0 74 07 31 c0 e9 09 01 00
> RIP <ffffffff8018f8ee>{__aio_put_req+39} RSP <ffffffff803a89c8>
>  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
>  (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> 
> Is it just my setup or
> - does Xen not scale at all to 100+ machines?
> - does blktap not scale at all?
> - is blktap with qcow very unstable right now?
> 
> Thank you for any pointers,
> 

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.