[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 7/7] libxl: Wait for QEMU startup in stubdomain



On Fri, Feb 06, 2015 at 10:46:15AM -0500, Eric Shelton wrote:
> On Fri, Feb 6, 2015 at 9:59 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> > On Fri, Feb 06, 2015 at 08:56:40AM -0500, Eric Shelton wrote:
> >> On Fri, Feb 6, 2015 at 6:16 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> >>
> >> I simply used the code already present in the QEMU upstream code,
> >> which is writing to that particular ath to indicate "running."  Since
> >> it is distinct from the path used by the QEMU instance running in
> >> Dom0, it works for my intended purpose: ensuring the device model is
> >> running before unpausing the HVM guest.  When you say it is "wrong,"
> >> is that just because you ultimately intend to rearchitect this and use
> >> something different?  If so, maybe the path I am using is "good
> >> enough" until that happens.  Otherwise, can you suggest a better path
> >> or mechanism?
> >>
> >
> > It is not "good enough". It just happens to be working.
> >
> > Currently the path is hardcoded "/local/domain/0/BLAH". It's wrong,
> > because the QEMU in stubdom is not running in 0. The correct prefix
> > should be "/local/domain/$stubdom_id".
> 
> OK; that definitely makes more sense - I recall the same idea crossing
> my mind when I first dug into this.  Although the revised protocol may
> go in a different direction, I will adopt this approach for now.
> 

Sorry for not having explained this clearer in the first email. I can
keep you in the loop if you're interested.

> >> I noticed some discussion about this on xen-devel.  Unfortunately, I
> >> was unable to find anything that laid out specifically what the
> >> problems are - can you point me to a bug report or such?  The libxl
> >> startup code - with callbacks on top of callbacks, callbacks within
> >> callbacks, and callbacks stashed away in little places only to be
> >> called _much_ later - is really convoluted, I suspect particularly so
> >> for stubdom startup.  I am not surprised it got broken - who can
> >> remember how it works?
> >>
> >
> > It's not how libxl is coded. It's the startup protocol that is broken.
> > The breakage of stubdom in Xen 4.5 is a latent bug exposed by a new
> > feature.
> >
> > I guess I should just send a bug report saying "Device model startup
> > protocol is broken". But I don't have much to say at this point, because
> > thorough research for both qemu-trad and qemu-upstream is required to
> > produce a sensible report.
> 
> So, just where is the current protocol breaking down?  Is there a

The hard part is there are already too many hardcoded /local/domain/0,
figuring out what needs to be /local/domain/$stubdom_id might take some
time. On the other hand, we have a chance to start with a clean slate
with upstream QEMU so it might not as hard as I think.

> contemplated bandaid for 4.5.1?  I'm just trying to figure out what I
> might want to do differently.
> 

Paul posted a bandaid patch today. It will be backported to 4.5.1 when
it is time.

<1423236389-10908-1-git-send-email-paul.durrant@xxxxxxxxxx>

> > So prior to 4.5, when there is emulation request issued by a guest vcpu,
> > that request is put on a ring, guest vcpu is paused. When a DM shows up
> > it processes that request, posts response, then guest vcpu is unpaused.
> > So there is implicit dependency on Xen's behaviour for DM to work.
> >
> > In 4.5, a new feature called ioreq server is added. When Xen sees an
> > io request which no backing DM, it returns immediately. Guest sees some
> > wired value and crashes. That is, Xen's behaviour has changed and a
> > latent bug in stubdom's startup protocol is exposed.
> 
> So, is the approach that I took - waiting for the stubdom DM to finish
> initializing - a reasonable short-term solution?  I guess I am

The idea is certainly valid.

> wondering whether the fix you are contemplating is in libxl, the
> hypervisor, or both.
> 

It would mostly be in toolstack and QEMU.

Regardless of this protocol fix, there is still other stuffs like frame
buffer plumbing that awaits to be done. So as Stefano said if you can get
those upstreamed in QEMU that would be great.

Wei.

> Thanks,
> Eric

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.