[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] phy disks and vifs timing out in DomU



> > Hm, which version of DomU were these? I wonder if this is related to
> > the 'feature-barrier'
> > that is not supported with 3.0. Do you see anything in the DomU
> > about the disks?
> > or xen-blkfront? Can you run the guests with 'initcall_debug
> > loglevel=8 debug' to see
> > if if the blkfront is actually running on those disks.
> I have attached the domU console output with these options set.
> 
> I have also spent a fair amount of time trying to narrow down the
> conditions that cause it, with lots of hardware switching & disk
> imaging. The conclusion I came to was that it's not hardware related,
> but there's a subtle interaction going on with LVM that's causing the
> problem, but I'm struggling to work out how to narrow it down any
> further than that.
> 
> I started with a setup that works: Machine 1 with HDD 1 (IDE), and a
> setup that didn't: Machine 2 with HDD 2 (SATA). Machine 2 has an IDE
> port so I unplugged HDD 2 and put HDD 1 in Machine 2 and that setup
> worked thus excluding most of the hardware. Next I imaged HDD 3 (SATA)
> from HDD 1 (IDE), unplugged HDD 1 and put HDD 3 in Machine 2, and that
> setup worked, thus excluding an IDE/SATA issue, and giving me a disk I
> could safely play with. The disks are organised into two partitions,
> partition 1 is for Dom0, partition 2 is an LVM volume group and is
> used
> for the DomUs. One LV (called Main) in this volume group is used by
> Dom0
> to hold the DomU kernels, config information and other static data &
> executables, the rest of the VG is issued as LVs to the various DomUs
> as
> needed with a fair amount of free space left in the VG. I took the
> Main
> LV from HDD 2 (didn't work) and imaged it onto HDD 3 and by judicious
> LV
> renaming booted against this image and the setup failed - great I
> thought - it's looks like a very subtle config issue. Next I created a
> third LV this time imaged from the Main LV that worked, giving me
> three
> Main LVs (I called them Main-Works, Main-Broken & Main-Testing) and I
> simply use lvrename to select the one I wanted as active. However now
> I
> couldn't get the setup to work with any of these three Main LVs
> including the one that originally worked. Removing the LVs I had
> recently created, and going back to the original Main LV, the setup
> started working again.
> 
> I'm going to try an up to date version of LVM (the one I'm using is a
> little out of date), and see if that makes any difference, but the
> version I have at the moment has worked without problem in the past.

I've managed to isolate it a little tighter, but it's very strange. I'm also 
updated to the latest version of LVM but it makes no difference.

I have a system with two partitions, the second of which is an LVM volume 
group. I have a VM which has one vif, two tap:aio disks based on files in a LV 
within the volume group and two phy disks based on LVs within the volume group. 
I have managed to get to the situation where I can boot the physical machine 
and the VM starts correctly. If however I create a new LV of any size and with 
any name, and restart the physical machine the VM fails to start correctly, 
with the two phy disks timing out, the vif timing out and a kernel bug 90% of 
the time and a kernel oops 10% of the time. If I remove this new LV and reboot 
the physical machine the VM starts correctly again. There is no reason within 
my code that would cause the new LV to have an effect on the VM, but somehow it 
does.

Anthony.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.