[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 19308: regressions - FAIL



On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote:
[...]
> > > I've tracked this down to libxl writing a wrong physical-device 
> > > xenstore node when using regular files. When using block devices libxl 
> > > can write the physical-device because it can be fetched without 
> > > requiring the execution of the block script, but with regular files it 
> > > is not true, we must first execute the block script in order to mount 
> > > the regular file into a loop device and then fetch the physical-device 
> > > from the loop device to which the image has been mounted. Following 
> > > patch solves the issue for me.
> > >
> > 
> > Yes, that's the in question I think. That code snippet was introduced in:
> > 
> > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> > Author: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> > Date:   Tue Aug 7 14:26:29 2012 +0100
> > 
> >     libxl: write physical-device node if user did not supply a block script
> >     
> >     This reverts one of the intentional changes from 25733:353bc0801b11.
> >     That change exposed an issue with the xl migration protocol, which
> >     although safe triggers the hotplug scripts device sharing logic.
> >     
> >     For 4.2 we disable this logic by writing the physical-device xenstore
> >     node ourselves if a user did not supply a script. If the user did
> >     supply a script then we continue to rely on it to write the
> >     physical-device node (not least because the script may create the
> >     device and therefore it is not available before we run the script).
> >     
> >     This means that to support localhost migration a block hotplug script
> >     needs to be robust against adding a device twice and should not
> >     deactivate the device until it has been removed twice.
> >     
> >     This should be revisited for 4.3.
> >     
> >     Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> >     Acked-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> >     Committed-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> > 
> > And in the commit message it says this behavior should be revisited.
> 
> Which never happened :-(
> 
> I don't remember exactly but I think the real fix is a reworking of the
> sequencing of block device attach/detach vs the migration stop and copy
> phase, not a simple tweak IIRC.
> 
> > Tracing back to 25733 
> > (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> > things look more complicated. One interesting snippet in the commit
> > message is:
> > 
> > - libxl should not write the "physical-device" node. This is the
> >   responsibility of the block script. Writing the "physical-device"
> >   node in libxl basically completely short-cuts the standard block
> >   hotplug script which uses "physical-device" to know if it has run
> >   already or not.
> > 
> > That makes me believe the following fix is the correct thing to do in
> > long term.
> > 
> > I have to admit that I cannot fully consume the commit message of 25733
> > in one day so unless you (Ian) can confirm Roger's fix will not cause 
> > further
> > regression otherwise I would suggest reverting my change at the moment.
> 
> Can you test some lifecycle operations, in particular localhost
> migrations with both phy:// and file:// devices to see if it fixes it?
> If not then we can revert.
> 

Unfortunately with Roger's patch applied local migration for raw format
file disk doesn't work.

xc: detail: Save exit of domid 69 with rc=0
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: 
/etc/xen/scripts/block add [8102] exited with error status 1
libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File 
/data/s0.raw is loopback-mounted through /dev/loop0,
which is mounted in a guest domain,
and so cannot be mounted now.
libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: 
/etc/xen/scripts/block remove [8181] exited with error status 1
libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: 
/etc/xen/scripts/block failed; error detected.
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated 
reading ready message from migration receiver stream
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target 
process [8091] exited with error status 3
Migration failed, resuming at sender.

> Perhaps rather than removing that block entirely it should be
> conditional on S_ISBLK?
> 

With the conditional on S_ISBLK, raw format file mounted to loopdev,
local migration still breaks with above error.

So for now please revert that change.

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.