[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 13394: regressions - FAIL



On Fri, 2012-06-29 at 13:36 +0100, Stefano Stabellini wrote:
> On Fri, 29 Jun 2012, Ian Campbell wrote:
> > On Fri, 2012-06-29 at 13:29 +0100, Stefano Stabellini wrote:
> > > On Fri, 29 Jun 2012, Ian Campbell wrote:
> > > > On Fri, 2012-06-29 at 12:20 +0100, Ian Jackson wrote:
> > > > > xen.org writes ("[xen-unstable test] 13394: regressions - FAIL"):
> > > > > > Tests which did not succeed and are blocking,
> > > > > > including tests which could not be run:
> > > > > >  test-amd64-amd64-xl-qemuu-winxpsp3  9 guest-localmigrate fail 
> > > > > > REGR. vs. 13379
> > > > > 
> > > > > The logs show this:
> > > > > 
> > > > >   libxl: error: libxl_dom.c:632:switch_logdirty_timeout: logdirty 
> > > > > switch: wait for device model timed out
> > > > > 
> > > > > And in xenstore:
> > > > > 
> > > > >   /local/domain/0/device-model/5/logdirty/cmd = "enable"   (n0)
> > > > > 
> > > > > And in the source code:
> > > > > 
> > > > >   $ grep -R logdirty qemu-upstream-unstable.git/*
> > > > >   $
> > > > > 
> > > > > So the upstream qemu does not participate properly in the migration
> > > > > protocol.  And anyway this protocol seems to involve xenstore and I
> > > > > would have expected it to do something with QMP.  But there is no code
> > > > > in libxl to do this (and never has been) and no code in upstream qemu
> > > > > to do it either.
> > > > > 
> > > > > That means we'll get memory corruption in migrated guests with the new
> > > > > qemu: any time qemu writes to guest memory, it needs to trigger a
> > > > > logdirty update so that the write is properly transferred to the
> > > > > migration target domain.
> > > > > 
> > > > > With the old libxl we didn't notice this apart from random failures.
> > > > > With my new migration code, particularly
> > > > >    25542:1883e5c71a87
> > > > >    libxl: wait for qemu to acknowledge logdirty command
> > > > > this turns into a hard failure.
> > > > > 
> > > > > I will add this as an allowable test failure pending a proper fix.
> > > > 
> > > > Thanks for investigating. It does appear that this has always been
> > > > broken.
> > > > 
> > > > Do we think this is a blocker for 4.2?
> > > 
> > > I wouldn't consider it a blocker, given that upstream QEMU is not the
> > > default for HVM guests.
> > > 
> > > 
> > > > It certainly prevents us from suggesting that we support HVM migration
> > > > with the (non-default) upstream qemu.
> > > > 
> > > > If we can't fix this for 4.2 (e.g. because we need to get patches into
> > > > upstream qemu or because the libxl side is too involved) we should at a
> > > > minimum make libxl reject attempts to migrate such domains with an
> > > > appropriate error message.
> > > 
> > > We do need to get patches in QEMU to fix this but we could backport them 
> > > in
> > > qemu-upstream-unstable (and ask for backports to the stable trees).
> > 
> > Can we do that in time for 4.2? It's pretty late in the day.
> > 
> > I think we need to consider either achieving this or adding the
> > appropriate error message as a blocker. Hopefully the former but falling
> > back to the latter if it comes to it.
> 
> I think we should add an appropriate error message as a blocker. We
> should also try to fix this on the QEMU side, but given that the QEMU
> 1.0 stable tree is pretty much unmaintaned, we won't be able to backport
> the fix there, so we cannot be sure that a distro will end up with a
> QEMU with or without the fix.

Agreed. I'll add this to the 4.2 TODO list.

If we can get support into the mainline qemu then as a stretch goal we
can consider whether libxl can auto detect the availability of the
feature and react accordingly.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.