[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] BUG: ext3 corruption in domU



On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
> On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
>> Konrad is on vacation this week, so it'll probably be next week before
>> this gets looked at by him.
>
> And I finally got to this email in my 'vacation-mbox'
>>
>> Ian.
>>
>> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
>> > I realize folks are pretty busy, but we're still interested in getting
>> > this problem solved, and I want to be sure it's not lost in the
>> > shuffle.
>> > Any chance of getting some attention for it?
>> >
>> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> 
>> > wrote:
>> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> > >> (re-sending, first message seems to have gotten lost)
>> > >>
>> > >> I was referred here by Ian Campbell ijc@xxxxxxxxxxxxxx from 
>> > >> bugs.debian.org.
>> > >
>> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
>> > > some people who know about the block stuff to the CC.
>> > >
>> > > Guys, my suspicion is that the issue is that barriers issued by ext3
>> > > inside the guest aren't making it all the way down the
>> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>> > > filesystem to eventually corrupt itself.
>> > >
>> > > The issue seems to relate to the use of dm-crypt since
>> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
>> > >
>> > > However there is no problem with the local dom0 ext3 root filesystem
>> > > which is also in the same lvm VG on the crypt device (i.e.
>> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
>> > > something is up at the blkfront->back link which causes the barriers
>> > > which blkback is injecting into the block subsystem either don't make it
>> > > to the dm-crypt layer or do not DTRT once they arrive.
>> > >
>> > > I'm not really sure with how to proceed (or how to ask Anthony to
>> > > proceed) with verifying any part of that hypothesis though.
>> > >
>> > > ISTR issues with old vs new style barriers or barriers with no data in
>> > > them or something, could this be related to that? (or am I thinking of
>> > > DISCARD?)
>
> You are using two different kernel versions. The 2.6.32 domU is only using
> WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
> The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
> ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
> Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Date:   Mon Oct 10 00:42:22 2011 -0400
>
>     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
>
>
> which emulates the barrier request by draining all of the oustanding I/Os and 
> then
> sending the WRITE_FLUSH.
>
> But it looks like you are hitting an issue here. Just to make sure
> that is the case, what happens if you use the _same_ kernel in both dom0 and
> domU? Does it work then?
>

First, thank you so much for getting back to me, it's really appreciated.
At this point I've forgotten if I did this with Wheezy on Wheezy, and
what the result was.
I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
get back to you. I should be able to do that early next week.

>> > >
>> > > The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
>> > > on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
>> > > Wheezy on Wheezy now so this isn't cross version confusion about barrier
>> > > semantics AFAICT.
>> > >
>> > > Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.