[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] skeleton frontend/backend examples and a deadlock



On Wed, Nov 02, 2005 at 04:22:44PM +1100, Rusty Russell wrote:

> Here are example frontend and backend driver skeletons.

Rusty, your skeleton driver has suggested a number of new abstractions that we
could make to the driver layer, as I mentioned in my email just previous.
I've taken on board a lot of the code here, the structure, and so forth.  I
have taken many of the ideas into the Xenbus layer, where they can be shared
with other front/back split drivers, and there is plenty of renaming to
reflect that fact, but the general concepts are still there.

I wanted to explain what I've taken, where I've put it, and what I've changed,
so here goes.

> diff -r 20d1a79ebe31 linux-2.6-xen-sparse/drivers/xen/skeleton/skeleton_be.c
> --- /dev/null Wed Oct 26 15:59:13 2005
> +++ b/linux-2.6-xen-sparse/drivers/xen/skeleton/skeleton_be.c Wed Nov  2 
> 15:24:59 2005

> [Snip header and comments]

> +/* WAITING: our directory exists, fields aren't there yet.
> + * EXISTS: the tools/udev has written fields we need.
> + * READY: we have written our info into the store for the frontend to read.
> + *        We can only enter this state once frontend is not "connected", to
> + *        cover the case of module reload (frontend might not have noticed us
> + *        going away yet).
> + * CONNECTED: we have read frontend information from the store.  We create 
> the
> + *        "connected" node.
> + */
> +enum state
> +{
> +     WAITING,
> +     EXISTS,
> +     READY,
> +     CONNECTED,
> +};

The general concept of an explicit state machine at the driver level is a good
one, and I've taken that on board.  For visibility of the process, I've
actually added a state node to the store in the frontend and backend
directories.  This node can be watched by the other end, by Xend, or by other
diagnostic tools.

I've changed the states a little:

Initialising
InitWait      == WAITING
Initialised   == READY
Connected     == CONNECTED
Closing
Closed

As you can see, I've dropped the EXISTS state.  I prefer to think of this as
separate from the initialisation of the frontend/backend, because the
conversation is with udev / hotplug / whatever, rather than with the frontend.
Since I've moved this handling into the Xenbus layer, I've left handling
EXISTS or similar with the specific driver.

I've added an Initialising state, for clarity, since this is being written in
the store and we expect diagnostic tools to look for it, and a closedown
exchange, which seems sensible to me in general, and is necessary for
hotplugging block devices in particular.

> +/* Private information about this device */
> +struct skeleton_be_info
> +{
> +     /* xenbus device we belong to */
> +     struct xenbus_device *dev;

> +
> +     /* frontend path */
> +     char *frontend;
> +
> +     /* frontend id */
> +     int frontend_id;

These two have moved into xenbus_device, as otherend and otherend_id.  This
allows us to share code between the backend and the frontend as they handle
the connection to their peer.

> +
> +     /* Mapping for frontend page */
> +     struct vm_struct *vm;
> +     u16 shmem_handle;
> +
> +     /* grant table reference to page frontend offered */
> +     int ring_ref;
> +
> +     /* event channel to send interrupts to frontend */
> +     int evtchn;
> +     int fe_evtchn;

This arrangement is driver-specific.  For net, we use two channels, one for
tx and one for rx, but for block we only need the one.

> +
> +     /* Watch we place on frontend */
> +     struct xenbus_watch watch;

Also moved into xenbus_device.

> +
> +     /* Watch we place on ourselves. */
> +     struct xenbus_watch be_watch;

This stays where it is.  Drivers need not use this, although all the ones that
use hotplug do.  Commoning this out might be a useful task for the future, but
I'm not sure it's worth the effort.

> +     /* If we are fully connected to backend. */
> +     enum state state;
> +
> +     /* Device-specific (eg. net, block) stuff. */
> +     struct device_specific_info *info;
> +};
> +

> [Snip state() function, simple stringifier]

> +static inline int bind_event_channel(domid_t id, int evtchn)
> +{
> +     int err;
> +     evtchn_op_t op = {
> +             .cmd = EVTCHNOP_bind_interdomain,
> +             .u.bind_interdomain.remote_dom = id,
> +             .u.bind_interdomain.remote_port = evtchn };
> +     err = HYPERVISOR_event_channel_op(&op);
> +     if (err)
> +             return err;
> +     return op.u.bind_interdomain.local_port;
> +}

This I've not taken yet, because in the existing net and blk drivers this code
happens inside interface.c, and I haven't yet decided how I'd like to arrange
that.  With luck, this will end up looking very much like
xenbus_client.c:xenbus_alloc_evtchn, mentioned below.

> [Snip {setup,free}_device_specific_crap, stop_device_replies]

Of course, the device-specific stuff stays in the individual drivers!

> +/* Write the information out to the store for the frontend to read, and
> + * know we're ready. */
> +static int publish_info(struct skeleton_be_info *info)
> +{
> +     return xenbus_printf(NULL, info->dev->nodename, "stuff", "%u", 7);
> +}

The actual writing is driver-specific, of course.  One advantage of having an
explicit state node is that the semantics of "let the frontend know we're
ready" are not required, as immediately following this call you change to
state READY, and in my scheme this state change is visible explicitly.

> +/* Frontend gone/going away.  Clean up. */
> +static void skeleton_stop(struct skeleton_be_info *info)
> +{
> +     printk("%s: state %s\n", __func__, state(info));
> +     stop_device_replies(info->info);
> +
> +     xenbus_rm(NULL, info->dev->nodename, "stuff");
> +}

When using hotplug, it's necessary to push the rm into those scripts, because
you generally need the information in the store in order to decide how to do
your teardown.  Otherwise, this is fine.

> +static struct vm_struct *map_page(int ref, domid_t id, u16 *handle)
> +{
> +     struct gnttab_map_grant_ref op;
> +     struct vm_struct *vm;
> +
> +     vm = alloc_vm_area(PAGE_SIZE);
> +     if (!vm)
> +             return ERR_PTR(-ENOMEM);
> +
> +     op.host_addr = (unsigned long)vm->addr;
> +     op.flags     = GNTMAP_host_map;
> +     op.ref       = ref;
> +     op.dom       = id;
> +
> +     lock_vm_area(vm);
> +     BUG_ON(HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1));
> +     unlock_vm_area(vm);
> +
> +     if (op.handle < 0) {
> +             free_vm_area(vm);
> +             return ERR_PTR(op.handle);
> +     }
> +
> +     *handle = op.handle;
> +     return vm;
> +}
> +
> +static void unmap_page(struct vm_struct *vm, u16 handle)
> +{
> +     struct gnttab_unmap_grant_ref op;
> +
> +     printk("%s enter\n", __func__);
> +     op.host_addr    = (unsigned long)vm->addr;
> +     op.handle       = handle;
> +     op.dev_bus_addr = 0;
> +
> +     lock_vm_area(vm);
> +     BUG_ON(HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1));
> +     unlock_vm_area(vm);
> +     printk("%s exit\n", __func__);
> +}

This abstraction is not perfect, unfortunately, because the net driver uses
two adjacent pages, because it has it's two ring buffers.  It would be worth
making it possible for this abstraction to move into xenbus_client, but for
now it's not there.

> +/* FIXME: This can be called before skeleton_probe() finished. */
> +static void frontend_changed(struct xenbus_watch *watch,
> +                         const char **vec, unsigned int len)
> +{
> +     struct skeleton_be_info *info;
> +     struct xenbus_transaction *trans;
> +     int err;
> +
> +     info = container_of(watch, struct skeleton_be_info, watch);
> +
> +     /* If frontend has gone away, shut down. */
> +     if (!xenbus_exists(NULL, info->frontend, "")) {
> +             device_unregister(&info->dev->dev);
> +             return;
> +     }
> +
> +     switch (info->state) {
> +     case WAITING:
> +             break;
> +
> +     case EXISTS:
> +             if (xenbus_exists(NULL, info->frontend, "connected")) {
> +                     xenbus_dev_error(info->dev, -EBUSY,
> +                                      "frontend still connected");
> +                     return;
> +             }
> +
> +             err = publish_info(info);
> +             if (err) {
> +                     xenbus_dev_error(info->dev, err,
> +                                      "writing information");
> +                     return;
> +             }                       
> +             info->state = READY;
> +             /* fall thru */

This is publishing driver-specific stuff.

> +     case READY:
> +             /* Try to read frontend stuff. */
> +     again:
> +             trans = xenbus_transaction_start();
> +             if (IS_ERR(trans)) {
> +                     xenbus_dev_error(info->dev, PTR_ERR(trans),
> +                                      "starting transaction");
> +                     return;
> +             }
> +             err = xenbus_gather(NULL, info->frontend,
> +                                 "ring-reference", "%u", &info->ring_ref,
> +                                 "event-channel", "%u", &info->fe_evtchn,
> +                                 NULL);
> +             if (!err) {
> +                     err = xenbus_transaction_end(trans, 0);
> +                     if (err == -EAGAIN)
> +                             goto again;
> +             }
> +             if (err) {
> +                     xenbus_dev_error(info->dev, err, "reading from %s",
> +                                      info->frontend);
> +                     return;
> +             }
> +
> +             err = bind_event_channel(info->frontend_id, info->fe_evtchn);
> +             if (err < 0) {
> +                     xenbus_dev_error(info->dev, err,
> +                                      "binding event channel");
> +                     return;
> +             }
> +             info->fe_evtchn = err;
> +
> +             info->vm = map_page(info->ring_ref, info->frontend_id,
> +                                 &info->shmem_handle);
> +             if (IS_ERR(info->vm)) {
> +                     xenbus_dev_error(info->dev, PTR_ERR(info->vm),
> +                                      "mapping page");
> +                     return;
> +             }
> +             info->state = CONNECTED;

This is reading driver-specific connection details from the frontend.

> +             /* Clear any previous errors. */
> +             xenbus_dev_ok(info->dev);
> +             xenbus_printf(NULL, info->dev->nodename, "connected", "ok");
> +             break;
> +
> +     case CONNECTED:
> +             /* Did frontend driver shut down? */
> +             if (!xenbus_exists(NULL, info->frontend, "ring-reference")) {
> +                     xenbus_dev_error(info->dev, -ENOENT,
> +                                      "frontend disconnected");
> +                     xenbus_rm(NULL, info->dev->nodename, "connected");
> +                     unmap_page(info->vm, info->shmem_handle);
> +                     skeleton_stop(info);
> +                     info->state = EXISTS;

And this is pretty standard closedown stuff.

> +             }
> +     }
> +}

> +static int skeleton_watch_front(struct xenbus_device *dev,
> +                             struct skeleton_be_info *info)
> +{
> +     int err;
> +
> +     printk("%s: state %s\n", __func__, state(info));
> +
> +     /* We need frontend-id and path. */
> +     err = xenbus_gather(NULL, dev->nodename,
> +                         "frontend-id", "%i", &info->frontend_id,
> +                         "frontend", NULL, &info->frontend,
> +                         NULL);
> +     if (err) {
> +             xenbus_dev_error(dev, err, "reading frontend or frontend-id");
> +             goto out;
> +     }
> +
> +     info->watch.node = info->frontend;
> +     info->watch.callback = frontend_changed;
> +     err = register_xenbus_watch(&info->watch);
> +     if (err) {
> +             xenbus_dev_error(dev, err, "placing watch on %s",
> +                              info->frontend);
> +             goto free_frontend;
> +     }

This code has gone into xenbus_probe.c:talk_to_otherend.  This means that we
can treat the frontend/frontend_id handling in the backend with the same code
as the backend/backend_id handling in the frontend.

I've also changed this so that the watch is on frontend/state, not the whole
frontend directory.  This means that only state changes will trigger through
this code.

> +     /* frontend_changed called immediately: stuff might be there already.*/
> +     frontend_changed(&info->watch, NULL, 0);

This is not necessary, because the a watch is now implicitly fired by
xenstored when registered with it.

> +     return 0;
> +
> +free_frontend:
> +     kfree(info->frontend);
> +out:
> +     return err;
> +}
> +

> +static void self_changed(struct xenbus_watch *watch,
> +                      const char **vec, unsigned int len)
> +{
> +     struct skeleton_be_info *info;
> +     int err;
> +
> +     info = container_of(watch, struct skeleton_be_info, be_watch);
> +     printk("%s: state %s\n", __func__, state(info));
> +
> +     /* We only need this while we're waiting for config. */
> +     if (info->state != WAITING)
> +             return;
> +
> +     /* Not there yet?  Keep waiting. */
> +     info->info = setup_device_specific_crap(info->dev);
> +     if (!info->info)
> +             return;
> +
> +     info->state = EXISTS;
> +     err = skeleton_watch_front(info->dev, info);
> +     if (err) {
> +             free_device_specific_crap(info->info);
> +             return;
> +     }
> +}

Device-specific, and often required to wait for the hotplug scripts.

> +static int skeleton_probe(struct xenbus_device *dev,
> +                       const struct xenbus_device_id *id)
> +{
> +     int err;
> +     struct skeleton_be_info *info;
> +
> +     printk("skeleton_probe_be: %s\n", dev->nodename);
> +
> +     dev->data = info = kmalloc(sizeof(*info), GFP_KERNEL);
> +     if (!info) {
> +             xenbus_dev_error(dev, -ENOMEM, "allocating info structure");
> +             return -ENOMEM;
> +     }
> +     info->dev = dev;
> +     info->state = WAITING;
> +
> +     /* Try for config (might need to wait for udev). */
> +     info->be_watch.node = dev->nodename;
> +     info->be_watch.callback = self_changed;
> +     err = register_xenbus_watch(&info->be_watch);
> +     if (err) {
> +             xenbus_dev_error(dev, err, "placing watch on self %s",
> +                              dev->nodename);
> +             kfree(info);
> +             return err;
> +     }
> +     self_changed(&info->be_watch, NULL, 0);
> +     return 0;
> +}

This is driver-specific, so stays where it is.

> +static int skeleton_remove(struct xenbus_device *dev)
> +{
> +     struct skeleton_be_info *info = dev->data;
> +
> +     switch (info->state) {
> +     case CONNECTED:
> +             unmap_page(info->vm, info->shmem_handle);
> +             /* fall thru */
> +     case READY:
> +             skeleton_stop(info);
> +             /* Must remove this after other fields. */
> +             xenbus_rm(NULL, dev->nodename, "connected");
> +             /* fall thru */
> +     case EXISTS:
> +             unregister_xenbus_watch(&info->watch);
> +             free_device_specific_crap(info->info);
> +             /* fall thru */
> +     case WAITING:
> +             unregister_xenbus_watch(&info->be_watch);
> +     }
> +
> +     kfree(info);
> +
> +     return 0;
> +}
> +

> [Snip registration stuff, discussed below with the frontend driver]

> diff -r 20d1a79ebe31 linux-2.6-xen-sparse/drivers/xen/skeleton/skeleton_fe.c
> --- /dev/null Wed Oct 26 15:59:13 2005
> +++ b/linux-2.6-xen-sparse/drivers/xen/skeleton/skeleton_fe.c Wed Nov  2 
> 15:24:59 2005

> [Snip comments and includes]

> +/* Private information about this device */
> +struct skeleton_info
> +{
> +     /* xenbus device we belong to */
> +     struct xenbus_device *dev;
> +
> +     /* backend path */
> +     char *backend;
> +
> +     /* backend id */
> +     int backend_id;
> +
> +     /* page we offer to share */
> +     void *page;
> +
> +     /* grant table reference to page we offer to backend */
> +     int ring_ref;
> +
> +     /* event channel to send interrupts to backend */
> +     int evtchn;
> +
> +     /* Watch we place on backend */
> +     struct xenbus_watch watch;
> +
> +     /* If we are fully connected to backend. */
> +     enum state state;
> +
> +     /* Information given by the backend */
> +     int backend_stuff;
> +
> +     /* Device-specific (eg. net, block) stuff. */
> +     struct device_specific_info *info;
> +};

The treatment of this has been the same as the treatment of skeleton_be_info
-- the features related to the connection to the peer (the backend in this
case) have moved into xenbus_device, and everything else is driver-specific,
so stays.

> [Snip state() function, simple stringifier]

> +static inline int allocate_event_channel(domid_t id)
> +{
> +     int err;
> +     evtchn_op_t op = {
> +             .cmd = EVTCHNOP_alloc_unbound,
> +             .u.alloc_unbound.dom = DOMID_SELF,
> +             .u.alloc_unbound.remote_dom = id };
> +
> +     err = HYPERVISOR_event_channel_op(&op);
> +     if (err)
> +             return err;
> +     return op.u.alloc_unbound.port;
> +}

This has moved into xenbus_client.c:xenbus_alloc_evtchn.  I renamed it for
consistency with the command, to avoid confusion.

> +
> +static inline void free_event_channel(int evtchn)
> +{
> +     evtchn_op_t op = {
> +             .cmd = EVTCHNOP_close,
> +             .u.close.port = evtchn };
> +     HYPERVISOR_event_channel_op(&op);
> +}

There is no need for this in the real drivers, as they are all using
unbind_from_irqhandler, which handles closing the event channel implicitly.

> [ Snip {setup,free}_device_specific_crap, stop_device-requests ]

> +/* Write the information out to the store for the backend to read, and
> + * know we're ready. */
> +static int publish_info(struct skeleton_info *info)
> +{
> +     struct xenbus_transaction *trans;
> +     int err;
> +
> +     printk("%s: state %s\n", __func__, state(info));
> +     /* Transactions can fail spuriously, which means we loop. */
> +again:
> +     trans = xenbus_transaction_start();
> +     if (IS_ERR(trans))
> +             return PTR_ERR(trans);
> +
> +     err = xenbus_printf(trans, info->dev->nodename, "ring-reference", "%u",
> +                         info->ring_ref);
> +     if (!err)
> +             err = xenbus_printf(trans, info->dev->nodename,
> +                                 "event-channel", "%u", info->evtchn);
> +
> +     if (err) {
> +             xenbus_transaction_end(trans, 1);
> +             return err;
> +     }
> +     err = xenbus_transaction_end(trans, 0);
> +     if (err == -EAGAIN)
> +             goto again;
> +     return err;
> +}

This is driver-specific publication of the connection info.

> +
> +/* Backend gone/going away.  Clean up. */
> +static void skeleton_stop(struct skeleton_info *info)
> +{
> +     printk("%s: state %s\n", __func__, state(info));
> +     stop_device_requests(info->info);
> +
> +     /* FIXME: can't use transaction, it requires alloc. */
> +     xenbus_rm(NULL, info->dev->nodename, "ring-reference");
> +     xenbus_rm(NULL, info->dev->nodename, "event-channel");
> +}
> +

> [Snip backend_changed, which has been split in a similar way to
> frontend_changed.]

> +static int skeleton_resume(struct xenbus_device *dev)
> +{
> +     int err;
> +     struct skeleton_info *info = dev->data;
> +
> +     printk("%s: state %s\n", __func__, state(info));
> +
> +     /* We need backend-id and path. */
> +     err = xenbus_gather(NULL, dev->nodename,
> +                         "backend-id", "%i", &info->backend_id,
> +                         "backend", NULL, &info->backend,
> +                         NULL);
> +     if (err) {
> +             xenbus_dev_error(dev, err, "reading backend or backend-id");
> +             goto out;
> +     }

This has gone into xenbus_probe.c:read_from_otherend.

> +     /* We need to allocate a page and event channel. */
> +     info->page = (void *)__get_free_page(GFP_KERNEL);
> +     if (!info->page) {
> +             err = -ENOMEM;
> +             xenbus_dev_error(dev, err, "allocating shared page");
> +             goto free_backend;
> +     }
> +
> +     err = gnttab_grant_foreign_access(info->backend_id,
> +                                       virt_to_mfn(info->page), 0);
> +     if (err < 0) {
> +             xenbus_dev_error(dev, err, "granting page");
> +             goto free_page;
> +     }
> +     info->ring_ref = err;
> +
> +     err = allocate_event_channel(info->backend_id);
> +     if (err < 0) {
> +             xenbus_dev_error(dev, err, "allocating event channel");
> +             goto ungrant_page;
> +     }
> +     info->evtchn = err;
> +
> +     info->watch.node = info->backend;
> +     info->watch.callback = backend_changed;
> +     err = register_xenbus_watch(&info->watch);
> +     if (err) {
> +             xenbus_dev_error(dev, err, "placing watch on %s", 
> info->backend);
> +             goto free_event_channel;
> +     }
> +     /* backend_changed called immediately: stuff might be there already. */
> +     backend_changed(&info->watch, NULL, 0);
> +     return 0;
> +
> +free_event_channel:
> +     free_event_channel(info->evtchn);
> +ungrant_page:
> +     /* FIXME: Need infrastructure to handle otherside holding onto page. */
> +     gnttab_end_foreign_access(info->ring_ref, 0);
> +free_page:
> +     free_page((unsigned long)info->page);
> +free_backend:
> +     kfree(info->backend);
> +out:
> +     return err;
> +}

> [Snip skeleton_probe; device-specific]

> +/* Clean up: will re-init and connect to backend on resume. */
> +static int skeleton_suspend(struct xenbus_device *dev)
> +{
> +     struct skeleton_info *info = dev->data;
> +
> +     printk("%s: state %s\n", __func__, state(info));
> +     switch (info->state) {
> +     case CONNECTED:
> +             xenbus_rm(NULL, dev->nodename, "connected");
> +             info->state = READY;
> +             /* fall thru */
> +     case READY:
> +             skeleton_stop(info);
> +             info->state = EXISTS;
> +     case EXISTS:
> +             ;               /* Nothing to do */
> +     }
> +
> +     /* FIXME: Need infrastructure to handle otherside holding onto page. */
> +     gnttab_end_foreign_access(info->ring_ref, 0);
> +     free_page((unsigned long)info->page);
> +     kfree(info->backend);
> +     printk("%s:%i\n", __func__, __LINE__);
> +     unregister_xenbus_watch(&info->watch);
> +     printk("%s:%i\n", __func__, __LINE__);
> +     return 0;
> +}

> [Snip skeleton_remove and the driver registration details, both pretty
> trivial]

> +static int init(void)
> +{
> +     return xenbus_register_driver(&driver);
> +}
> +
> +static void fini(void)
> +{
> +     xenbus_unregister_driver(&driver);
> +}
> +
> +module_init(init);
> +module_exit(fini);

All the drivers has init functions before, of course, but I have added exit
functions to the frontend drivers.  The backend drivers I'm not sure about, as
the netback driver actually explicitly issues a BUG() if the exit function is
called.

> +MODULE_LICENSE("GPL");

I have added this to blkfront and netfront, except those files are both
BSD-licenced, so that's what I've added.  The backend drivers don't have a
licence stated explicitly, so I've been forced to leave those alone for now.

That's it.  As you can see, we've taken on board a lot of the abstractions
you've created here, and made them available to all drivers.  Thanks a lot for
all your work!

Ewan.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.