[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Error recovery in Xen's paravirtualizing USB driver for Linux
USB Folks, I've been working on a USB device driver for Linux running paravirtualized on the Xen hypervisor and I have a few questions about the design of the error recovery... This 'USB split driver' has a 'front-end' in the Linux kernel running in a guest domain of the hypervisor and a 'back-end' in the Linux kernel running in a device driver domain (usually the special privilidged domain 0). The user configures Xen to map a USB port from the back-end domain to the front-end domain and the split driver makes any USB device attached to the back-end port available in the front-end domain. The split driver works roughly as follows: o - The back-end uses usb_register to register itself as a driver matching all USB IDs so it gets probed for every USB device that is connected. When a USB device is probed for a configured port, the driver claims all the interfaces for the device. o - The front-end uses usb_create_hcd to create a HCD with a single port. o - The front-end and the back-end communicate using the Xen inter-domain communication primitives so the hub status in the front-end reflects whether or not a device is plugged into the back-end port. o - URBs received by the front-end HCD are translated into Xen inter-domain-communication primitives and routed to the back-end domain. o - The inter-domain communication primitives are translated back into URBs in the back-end and submitted to the Linux USB stack for the device attached to the back-end port. The split driver now works well enough to mount a filesystem on a USB key in the front-end domain so it's getting there. I did have a problem with URBs getting reordered on their way between the front-end and the back-end which led to miscompares where the correct bulk data was written on the USB key but at the wrong LBA. I fixed this by maintaining submission ordering in the URB queue from front-end to back-end. This issue made me realize there was a similar problem during error recovery if an URB fails when there are several in flight and indeed, reading the recent USB code, there is a big comment about the guarantees a driver gets on an error: the URB queue stalls until the driver's completion handler returns from completion of the failing URB and any URBs unlinked during completion of the failing URB. Right now, if there is an error on an URB in my USB split driver back-end then the back-end completion handler returns before the front end USB driver client's completion handler gets a chance to do any unlinks so the URBs are not stalled and my driver is broken. I'm trying to work out how to fix it. I'm thinking of doing the following: When there is an error returned by an URB in the back-end, the completion handler of the back end will unlink all of the URBs in progress for the endpoint of the failing URB and set a 'fail_urb' flag for the endpoint to fail back any URBs which arrive subsequently for that endpoint until the 'fail_urb' flag is reset. Whenever the front-end completion handler returns from completing an URB back to the client and finds that an endpoint has gone idle it will set a flag to indicate that the next URB sent to the back-end for that endpoint should clear the 'fail_urb' flag in the back end before starting. This strategy is similar to the concept of auto contingent allegiance for SCSI. My questions are as follows: 1) I think it is impossible to maintain exactly the same guarantee that the Linux USB stack provides across the Xen inter-domain interface because I don't think it is desirable to attempt the synchronisation of the two kernels in different domains required to nest the call of the front end completion within the call of the back end completion. Is this correct? 2) I believe the above strategy to be more conservative than the Linux guarantee and sufficient to avoid miscompares due to re-ordering as a result of URBs failing whilst there are multiple URBs in flight. Is this correct? 3) I think the above strategy is compatible with the suggested recovery mechanism for Linux USB drivers which is to unlink outstanding URBs in the URB completion handler. Is the above strategy likely to work? 4) I'm wondering what error to fail URBs with when 'fail_urb' is set. I'm guessing either -ECONNRESET which is the same as if the URB was unlinked or perhaps -EAGAIN. What would be the correct value? 5) Is there a more USBish way to solve this problem that will fit better with the USB infrastructure. 6) Any other ideas? Thanks for your help! Harry Butterworth _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |