[Xen-devel] Re: DomU not rebooting in 3.0.1

On Mon, Feb 27, 2006 at 02:06:14PM -0500, Sean Dague wrote:

> On Mon, Feb 27, 2006 at 03:33:17PM +0000, Ewan Mellor wrote:
> > On Sat, Feb 25, 2006 at 11:31:57AM -0500, Brian Hays wrote:
> > 
> > > In 3.0.1 it appears that reboots from within a domU result in the domU
> > > shutting down but not coming back up.
> > > 
> > > /var/log/xen-hotplug.log reports something similar to "xenstore-list:
> > > could not read path backend/vbd/40"
> > > 
> > > Attempts to manually start the domU up again via "xm create" fail until
> > > first running "xenstore-rm backend/vbd/40"
> > > 
> > > This has happened on several occasions within different domU's on
> > > different machines. Is there a known workaround... Is this already fixed
> > > in unstable?
> > 
> > This is being tracked as bug #514.  I think I have a fix, which should be in
> > today, all being well.
> Ewan, I didn't notice 514, but I submitted a related bug
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=549.  It appears that
> the big deciding factor on whether or not I see the race is is the DomU that
> is rebooting running on a fully independant CPU from Dom0.  If it is, it
> breaks every time, if it isn't (i.e. it is on the *same* phys CPU as Dom0)
> it works every time.

Sean, your #549 is not the same as #514, because your /var/log/xen-hotplug.log
show neither the message "xenstore-list: could not read path
backend/vbd/40" nor "xenstore-list: could not read path /local/domain/1/vm".
One of these messages is expected if you are hitting bug #514.

Your xen-hotplug.log does show

/etc/xen/scripts/vif-bridge: line 41: brctl: command not found

Obviously, I would expect your vif to be broken with this message showing.  I
wouldn't expect your vbd to be broken, unless there's some cascading failure
that I don't know about.

Nothing jumps out at me otherwise from your logs, so you're going to have to
dig a bit more.  As a first suggestion, I'd run xenstore-ls with the guest up
but broken, and see if there are any error messages in the store.  Xend loses
the error messages when it reboots a domain, because there's no "wait for the
devices" phase in the reboot logic.


