[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Error recovery in Xen's paravirtualizing USB driver for Linux



USB Folks,

    I've been working on a USB device driver for Linux running
paravirtualized on the Xen hypervisor and I have a few questions about
the design of the error recovery...

    This 'USB split driver' has a 'front-end' in the Linux kernel
running in a guest domain of the hypervisor and a 'back-end' in the
Linux kernel running in a device driver domain (usually the special
privilidged domain 0).

    The user configures Xen to map a USB port from the back-end domain
to the front-end domain and the split driver makes any USB device
attached to the back-end port available in the front-end domain.

    The split driver works roughly as follows:

o - The back-end uses usb_register to register itself as a driver
matching all USB IDs so it gets probed for every USB device that is
connected.  When a USB device is probed for a configured port, the
driver claims all the interfaces for the device.

o - The front-end uses usb_create_hcd to create a HCD with a single
port.

o - The front-end and the back-end communicate using the Xen
inter-domain communication primitives so the hub status in the front-end
reflects whether or not a device is plugged into the back-end port.

o - URBs received by the front-end HCD are translated into Xen
inter-domain-communication primitives and routed to the back-end domain.

o - The inter-domain communication primitives are translated back into
URBs in the back-end and submitted to the Linux USB stack for the device
attached to the back-end port.

    The split driver now works well enough to mount a filesystem on a
USB key in the front-end domain so it's getting there.

    I did have a problem with URBs getting reordered on their way
between the front-end and the back-end which led to miscompares where
the correct bulk data was written on the USB key but at the wrong LBA. I
fixed this by maintaining submission ordering in the URB queue from
front-end to back-end.

    This issue made me realize there was a similar problem during error
recovery if an URB fails when there are several in flight and indeed,
reading the recent USB code, there is a big comment about the guarantees
a driver gets on an error: the URB queue stalls until the driver's
completion handler returns from completion of the failing URB and any
URBs unlinked during completion of the failing URB.

    Right now, if there is an error on an URB in my USB split driver
back-end then the back-end completion handler returns before the front
end USB driver client's completion handler gets a chance to do any
unlinks so the URBs are not stalled and my driver is broken.  I'm trying
to work out how to fix it.

    I'm thinking of doing the following:

    When there is an error returned by an URB in the back-end, the
completion handler of the back end will unlink all of the URBs in
progress for the endpoint of the failing URB and set a 'fail_urb' flag
for the endpoint to fail back any URBs which arrive subsequently for
that endpoint until the 'fail_urb' flag is reset.

    Whenever the front-end completion handler returns from completing an
URB back to the client and finds that an endpoint has gone idle it will
set a flag to indicate that the next URB sent to the back-end for that
endpoint should clear the 'fail_urb' flag in the back end before
starting.

    This strategy is similar to the concept of auto contingent
allegiance for SCSI.

    My questions are as follows:

1) I think it is impossible to maintain exactly the same guarantee that
the Linux USB stack provides across the Xen inter-domain interface
because I don't think it is desirable to attempt the synchronisation of
the two kernels in different domains required to nest the call of the
front end completion within the call of the back end completion.  Is
this correct?

2) I believe the above strategy to be more conservative than the Linux
guarantee and sufficient to avoid miscompares due to re-ordering as a
result of URBs failing whilst there are multiple URBs in flight.  Is
this correct?

3) I think the above strategy is compatible with the suggested recovery
mechanism for Linux USB drivers which is to unlink outstanding URBs in
the URB completion handler.  Is the above strategy likely to work?

4) I'm wondering what error to fail URBs with when 'fail_urb' is set.
I'm guessing either -ECONNRESET which is the same as if the URB was
unlinked or perhaps -EAGAIN.  What would be the correct value?

5) Is there a more USBish way to solve this problem that will fit better
with the USB infrastructure.

6) Any other ideas?

Thanks for your help!

Harry Butterworth


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.