[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 13394: regressions - FAIL



On Fri, 29 Jun 2012, Ian Campbell wrote:
> On Fri, 2012-06-29 at 13:29 +0100, Stefano Stabellini wrote:
> > On Fri, 29 Jun 2012, Ian Campbell wrote:
> > > On Fri, 2012-06-29 at 12:20 +0100, Ian Jackson wrote:
> > > > xen.org writes ("[xen-unstable test] 13394: regressions - FAIL"):
> > > > > Tests which did not succeed and are blocking,
> > > > > including tests which could not be run:
> > > > >  test-amd64-amd64-xl-qemuu-winxpsp3  9 guest-localmigrate fail REGR. 
> > > > > vs. 13379
> > > > 
> > > > The logs show this:
> > > > 
> > > >   libxl: error: libxl_dom.c:632:switch_logdirty_timeout: logdirty 
> > > > switch: wait for device model timed out
> > > > 
> > > > And in xenstore:
> > > > 
> > > >   /local/domain/0/device-model/5/logdirty/cmd = "enable"   (n0)
> > > > 
> > > > And in the source code:
> > > > 
> > > >   $ grep -R logdirty qemu-upstream-unstable.git/*
> > > >   $
> > > > 
> > > > So the upstream qemu does not participate properly in the migration
> > > > protocol.  And anyway this protocol seems to involve xenstore and I
> > > > would have expected it to do something with QMP.  But there is no code
> > > > in libxl to do this (and never has been) and no code in upstream qemu
> > > > to do it either.
> > > > 
> > > > That means we'll get memory corruption in migrated guests with the new
> > > > qemu: any time qemu writes to guest memory, it needs to trigger a
> > > > logdirty update so that the write is properly transferred to the
> > > > migration target domain.
> > > > 
> > > > With the old libxl we didn't notice this apart from random failures.
> > > > With my new migration code, particularly
> > > >    25542:1883e5c71a87
> > > >    libxl: wait for qemu to acknowledge logdirty command
> > > > this turns into a hard failure.
> > > > 
> > > > I will add this as an allowable test failure pending a proper fix.
> > > 
> > > Thanks for investigating. It does appear that this has always been
> > > broken.
> > > 
> > > Do we think this is a blocker for 4.2?
> > 
> > I wouldn't consider it a blocker, given that upstream QEMU is not the
> > default for HVM guests.
> > 
> > 
> > > It certainly prevents us from suggesting that we support HVM migration
> > > with the (non-default) upstream qemu.
> > > 
> > > If we can't fix this for 4.2 (e.g. because we need to get patches into
> > > upstream qemu or because the libxl side is too involved) we should at a
> > > minimum make libxl reject attempts to migrate such domains with an
> > > appropriate error message.
> > 
> > We do need to get patches in QEMU to fix this but we could backport them in
> > qemu-upstream-unstable (and ask for backports to the stable trees).
> 
> Can we do that in time for 4.2? It's pretty late in the day.
> 
> I think we need to consider either achieving this or adding the
> appropriate error message as a blocker. Hopefully the former but falling
> back to the latter if it comes to it.

I think we should add an appropriate error message as a blocker. We
should also try to fix this on the QEMU side, but given that the QEMU
1.0 stable tree is pretty much unmaintaned, we won't be able to backport
the fix there, so we cannot be sure that a distro will end up with a
QEMU with or without the fix.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.