[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] repeated DomU volume corruption


  • To: xen-users <xen-users@xxxxxxxxxxxxx>
  • From: Robert Rust <rjrbytes@xxxxxxxxx>
  • Date: Sat, 6 Dec 2014 06:28:59 -0600
  • Delivery-date: Sat, 06 Dec 2014 12:29:17 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

My apologies for missing getting responses posted to the list on this ...

On Mon, Nov 24, 2014 at 9:09 AM, Ian CampbellÂ<Ian.Campbell@xxxxxxxxxx>Âwrote:
On Sun, 2014-11-23 at 21:04 -0600, Robert Rust wrote:
> I'm having headaches with DomU volume corruption. It isn't tied to
> any particular DomU, and I have never had it occur in the Dom0 (which
> is on the same disk) nor did it happen in the nearly 1 year that I was
> running on bare metal using the same drive. Last fall I even replaced
> the drive to try and resolve the problem, since it has plagued me
> every time I've tried Xen. If it matters, the most recent error (this
> morning) was as follows (and it occurred in 2 of my DomUs) :
> EXT4-fs error (device xvda2): ext4_mb_generate_buddy:756: group 66,
> 7224 clusters in bitmap, 7216 in gd; block bitmap corrupt.

Are you manually configuring anything to do with barriers, either to
enable or disable them? I think with modern kernels this is all supposed
to Just Work. I ask because AFAIK ext4 (and journalling filesystems) are
pretty sensitive to having this be working right, i.e. things being on
the disk when they've been told they are.

What is your storage backend configuration like in dom0? i.e. are your
guest disks on LVM volumes, raw files, qcow2, loopback mounts etc.

What does your guest cfg file look like?

Does the corruption correspond with anything interesting occurring? Like
rebooting the domU etc?

Have you run memtest on the system? (bit of a long shot...)

I'm using LVM volumes. I had corruption on 3 of them this past weekend. In two of the cases (very lightly used DomUs), it didn't show up until I was shutting down the DomU after installing package updates. For the third one, it showed up after rebooting (I noticed when my website failed to come back). I have not run memtest, but will certainly do so.

Sample cfg below for one of the affected DomUs:
kernel   Â= '/boot/vmlinuz-3.14.4-031404-generic'
extra    = 'elevator=noop'
ramdisk   = '/boot/initrd.img-3.14.4-031404-generic'
vcpus    = '2'
memory   Â= '512'
pvh ÂÂ Â= '1'
root    Â= '/dev/xvda2 ro'
disk    Â= [
         'phy:/dev/midgard-vg/gondor-disk,xvda2,w',
         'phy:/dev/midgard-vg/gondor-swap,xvda1,w',
       ]
name    Â= 'gondor'
dhcp    Â= 'dhcp'
vif     = [ 'mac=00:16:3E:B1:9B:9C,bridge=xenbr0' ]
>
on_reboot  = 'restart'
on_crash  Â= 'restart'
Â

> current setup:
> OS: Ubuntu server 14.04.1 64-bit
> kernel:Â3.14.4-031404-generic

Is this in dom0, domU or both?

>Â Â (for ATI radeon driver support I was experimenting with)

Is this the Free radeon driver? (rather than the proprietary/binary only
thing).

It's the binary driver. The driver actually worked fine other than the well-known performance degradation after reboots. I was trying to set up an PVHVM as a virtualized workstation for my kids with monitor and keyboard connected, but ran into problems on the sound front. Everything I tried resulted in either pure static or sound with lots of static. I'm currently exploring options for using a Raspberry Pi as a thin client.

-rjr

On Mon, Nov 24, 2014 at 9:09 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
On Sun, 2014-11-23 at 21:04 -0600, Robert Rust wrote:
> I'm having headaches with DomU volume corruption. It isn't tied to
> any particular DomU, and I have never had it occur in the Dom0 (which
> is on the same disk) nor did it happen in the nearly 1 year that I was
> running on bare metal using the same drive. Last fall I even replaced
> the drive to try and resolve the problem, since it has plagued me
> every time I've tried Xen. If it matters, the most recent error (this
> morning) was as follows (and it occurred in 2 of my DomUs) :
> EXT4-fs error (device xvda2): ext4_mb_generate_buddy:756: group 66,
> 7224 clusters in bitmap, 7216 in gd; block bitmap corrupt.

Are you manually configuring anything to do with barriers, either to
enable or disable them? I think with modern kernels this is all supposed
to Just Work. I ask because AFAIK ext4 (and journalling filesystems) are
pretty sensitive to having this be working right, i.e. things being on
the disk when they've been told they are.

What is your storage backend configuration like in dom0? i.e. are your
guest disks on LVM volumes, raw files, qcow2, loopback mounts etc.

What does your guest cfg file look like?

Does the corruption correspond with anything interesting occurring? Like
rebooting the domU etc?

Have you run memtest on the system? (bit of a long shot...)

> current setup:
> OS: Ubuntu server 14.04.1 64-bit
> kernel: 3.14.4-031404-generic

Is this in dom0, domU or both?

>Â Â (for ATI radeon driver support I was experimenting with)

Is this the Free radeon driver? (rather than the proprietary/binary only
thing).

Ian.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.