[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 19308: regressions - FAIL



On Mon, 2013-09-16 at 10:55 +0100, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monnà wrote:
> > On 15/09/13 14:50, Ian Campbell wrote:
> > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> > >> flight 19308 xen-unstable real [real]
> > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
> > >>
> > >> Regressions :-(
> > >>
> > >> Tests which did not succeed and are blocking,
> > >> including tests which could not be run:
> > >>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. 
> > >> vs. 19208
> > >>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. 
> > >> vs. 19208
> > >>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. 
> > >> vs. 19208
> > >>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. 
> > >> vs. 19208
> > >>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. 
> > >> vs. 19208
> > > 
> > > These are due to /var/run/xen-hotplug/block getting leaked
> > > 
> 
> The error message in XenStore shows blkback tries to get hold of the
> block device 0:0 but there's no such device entry in system.
> 
> > >>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. 
> > >> vs. 19208
> > >>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. 
> > >> vs. 19208
> > >>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. 
> > >> vs. 19208
> > > 
> > > These are:
> > >         libxl: error: libxl_device.c:894:device_backend_callback: unable
> > >         to add device with path /local/domain/0/backend/vbd/9/5632
> > >         libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to 
> > > add disk devices
> > > 
> > > /var/log/xen/xenhotplug.log contains:
> > >         xenstore-read: couldn't read path backend/vbd/9/5632/node
> > > 
> > > For both of these I'm suspicious of:
> > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
> > 
> > Hello,
> > 
> > I've tracked this down to libxl writing a wrong physical-device 
> > xenstore node when using regular files. When using block devices libxl 
> > can write the physical-device because it can be fetched without 
> > requiring the execution of the block script, but with regular files it 
> > is not true, we must first execute the block script in order to mount 
> > the regular file into a loop device and then fetch the physical-device 
> > from the loop device to which the image has been mounted. Following 
> > patch solves the issue for me.
> >
> 
> Yes, that's the in question I think. That code snippet was introduced in:
> 
> commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> Author: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> Date:   Tue Aug 7 14:26:29 2012 +0100
> 
>     libxl: write physical-device node if user did not supply a block script
>     
>     This reverts one of the intentional changes from 25733:353bc0801b11.
>     That change exposed an issue with the xl migration protocol, which
>     although safe triggers the hotplug scripts device sharing logic.
>     
>     For 4.2 we disable this logic by writing the physical-device xenstore
>     node ourselves if a user did not supply a script. If the user did
>     supply a script then we continue to rely on it to write the
>     physical-device node (not least because the script may create the
>     device and therefore it is not available before we run the script).
>     
>     This means that to support localhost migration a block hotplug script
>     needs to be robust against adding a device twice and should not
>     deactivate the device until it has been removed twice.
>     
>     This should be revisited for 4.3.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
>     Acked-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
>     Committed-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> 
> And in the commit message it says this behavior should be revisited.

Which never happened :-(

I don't remember exactly but I think the real fix is a reworking of the
sequencing of block device attach/detach vs the migration stop and copy
phase, not a simple tweak IIRC.

> Tracing back to 25733 
> (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> things look more complicated. One interesting snippet in the commit
> message is:
> 
> - libxl should not write the "physical-device" node. This is the
>   responsibility of the block script. Writing the "physical-device"
>   node in libxl basically completely short-cuts the standard block
>   hotplug script which uses "physical-device" to know if it has run
>   already or not.
> 
> That makes me believe the following fix is the correct thing to do in
> long term.
> 
> I have to admit that I cannot fully consume the commit message of 25733
> in one day so unless you (Ian) can confirm Roger's fix will not cause further
> regression otherwise I would suggest reverting my change at the moment.

Can you test some lifecycle operations, in particular localhost
migrations with both phy:// and file:// devices to see if it fixes it?
If not then we can revert.

Perhaps rather than removing that block entirely it should be
conditional on S_ISBLK?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.