[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 20 of 29 RFC] libxl: introduce libxl hotplug public API functions



2012/2/9 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> On Thu, 2012-02-09 at 16:18 +0000, Stefano Stabellini wrote:
>> On Thu, 9 Feb 2012, Ian Campbell wrote:
>> > On Thu, 2012-02-09 at 16:00 +0000, Stefano Stabellini wrote:
>> > > On Thu, 9 Feb 2012, Ian Campbell wrote:
>> > > > On Thu, 2012-02-09 at 15:32 +0000, Stefano Stabellini wrote:
>> > > > > On Thu, 9 Feb 2012, Ian Jackson wrote:
>> > > > > > Stefano Stabellini writes ("Re: [Xen-devel] [PATCH 20 of 29 RFC] 
>> > > > > > libxl: introduce libxl hotplug public API functions"):
>> > > > > > > - we can reuse the "state" based mechanism to establish a 
>> > > > > > > connection:
>> > > > > > > again not a great protocol, but very well known and understood.
>> > > > > >
>> > > > > > I don't think we have, in general, a good understanding of these
>> > > > > > "state" based protocols ...
>> > > > >
>> > > > > What?! We have netback, netfront, blkback, blkfront, pciback, 
>> > > > > pcifront,
>> > > > > kbdfront, fbfront, xenconsole, and these are only the ones in Linux!!
>> > > >
>> > > > And no one I know is able to describe, accurately, exactly what the
>> > > > state diagram for even one of those actually looks like or indeed 
>> > > > should
>> > > > look like. It became quite evident in these threads about hotplug 
>> > > > script
>> > > > handling etc that no one really knows for sure what (is supposed to)
>> > > > happens when.
>> > >
>> > > I thought that most of the thread was about the interface with the block
>> > > scripts, that is an entirely different matter and completely obscure.
>> > > If I am mistaken, please point me at the right email.
>> >
>> > We are talking about reusing the existing xenbus state machine schema
>> > for a new purpose. Ian J pointed out that these are not generally well
>> > understood, you replied that it was and cited some examples. I pointed
>> > out why these were not examples of why this stuff was well understood at
>> > all, in fact quite the opposite.
>>
>> Sorry but I don't understand how these examples are supposed to be
>> "quite the opposite".
>> I quite like the idea of being able to read a single source file of less
>> than 400 LOC to understand how a protocol works
>> (drivers/input/misc/xen-kbdfront.c).
>
> That is not a protocol specification, merely one implementation of it.
> What does the BSD driver do? Is it exactly the same as Linux? Should BSD
> driver authors be expected to reverse engineer the protocol from the
> Linux code? What/who arbitrates when the two behave differently?
>
>> In fact I don't think that understanding the protocol has been an issue
>> for the GSoC student that had to write a new one.
>
> Being able to reverse engineer something which works is not proof that
> these things are "well understood" in the general case.
>
>> I think we are under influence of a "reiventing the wheel" virus.
>
> I think we are in danger of making the same mistakes again as have been
> made with the device protocols and this is what I want to avoid.
>
> Now, perhaps this style of state machine protocol is a reasonable design
> choice in this case, but since we are starting afresh here this specific
> new instance should be well documented _up_front_ not left in the "oh,
> just read the Linux code" state we have now for many of our devices
> which has lead to multiple slightly divergent implementations of the
> same basic concept.

Yes, documentation about this protocol should go in together with the
protocol itself.

>
>> > > > Justin just posted a good description for blkif.h which included a 
>> > > > state
>> > > > machine description. We need the same for pciif.h, netif.h etc etc.
>> > >
>> > > The state machine is the same for block and network.
>> >
>> > No, it's not. This is exactly what IanJ and I are talking about.
>>
>> Could you please elaborate?
>>
>> I am sure you know that the xenstore state machine is handled the same
>> way for all the backends in QEMU (see hw/xen_backend.c).
>> And the same thing is true for the frontends and the backends in Linux.
>
> A substantial proportion of the threads about this hotplug script stuff
> has been about the fact that no one is quite sure what really happens
> when for all implementations nor what the common semantics are.
>
> e.g. How do you ask a backend to shut down (do you set it to state 5?
> state 6? do you nuke the xenstore dir?). Neither is anyone sure when the
> correct point to call the hotplug scripts actually is, or even what
> actually happens with them right now across the different backend
> drivers or kernel types.


This is true, BSD, Linux and Qemu have slightly different
implementations of the backend protocol at least, BSD and qemu-xen
doesn't react when setting backend "state" to 5 or "online" to 0. This
gave me some headaches that could be solved if this was properly
documented/implemented.

> The actual state transitions which netback and blkback go through are
> not the same: The netback protocol uses InitWait, the blkback one does
> not or is it vice-versa? I can't remember and it isn't documented. Some
> Linux frontends handled the kexec reconnect sequencing differently, by
> disconnecting or reconnecting the actual underlying devices at subtly
> different times and/or handling the transition from Closing back to Init
> or InitWait differently.

On this implementation I wait for backend to switch to state 2
(XenbusStateInitWait) before executing hotplug scripts for both vifs
and vbds, and it seems like they work ok on both Linux and NetBSD, and
again, since it's not documented anywhere, I just guess it's the
correct way to do it, but I can't be sure.

> And this is just for Linux talking to Linux.
>
> I know for sure that the Windows frontends follow a different state
> transition path to Linux (and that it has interacted badly with the
> kexec differences in the Linux backends discussed above). I bet BSD has
> some subtle differences in behaviour too.
>
> The fact is that none of our device state machine protocols are not well
> documented (although blkif.h is about to be). If this stuff were well
> understood we would already have such documentation because it would be
> trivial to write -- but it is not. If you disagree then please document
> the netif state machine protocol in the form of a patch to netif.h.
>
> Ian.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.