[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [linux-usb-devel] Error recovery in Xen's paravirtualizing USB driver for Linux



On Wed, 7 Dec 2005, harry wrote:

> USB Folks,
> 
>     I've been working on a USB device driver for Linux running
> paravirtualized on the Xen hypervisor and I have a few questions about
> the design of the error recovery...
> 
>     This 'USB split driver' has a 'front-end' in the Linux kernel
> running in a guest domain of the hypervisor and a 'back-end' in the
> Linux kernel running in a device driver domain (usually the special
> privilidged domain 0).
> 
>     The user configures Xen to map a USB port from the back-end domain
> to the front-end domain and the split driver makes any USB device
> attached to the back-end port available in the front-end domain.
> 
>     The split driver works roughly as follows:
> 
> o - The back-end uses usb_register to register itself as a driver
> matching all USB IDs so it gets probed for every USB device that is
> connected.  When a USB device is probed for a configured port, the
> driver claims all the interfaces for the device.
> 
> o - The front-end uses usb_create_hcd to create a HCD with a single
> port.
> 
> o - The front-end and the back-end communicate using the Xen
> inter-domain communication primitives so the hub status in the front-end
> reflects whether or not a device is plugged into the back-end port.
> 
> o - URBs received by the front-end HCD are translated into Xen
> inter-domain-communication primitives and routed to the back-end domain.
> 
> o - The inter-domain communication primitives are translated back into
> URBs in the back-end and submitted to the Linux USB stack for the device
> attached to the back-end port.
> 
>     The split driver now works well enough to mount a filesystem on a
> USB key in the front-end domain so it's getting there.
> 
>     I did have a problem with URBs getting reordered on their way
> between the front-end and the back-end which led to miscompares where
> the correct bulk data was written on the USB key but at the wrong LBA. I
> fixed this by maintaining submission ordering in the URB queue from
> front-end to back-end.

Clearly this is necessary for any queue, not just queues of USB URBs.

>     This issue made me realize there was a similar problem during error
> recovery if an URB fails when there are several in flight and indeed,
> reading the recent USB code, there is a big comment about the guarantees
> a driver gets on an error: the URB queue stalls until the driver's
> completion handler returns from completion of the failing URB and any
> URBs unlinked during completion of the failing URB.
> 
>     Right now, if there is an error on an URB in my USB split driver
> back-end then the back-end completion handler returns before the front
> end USB driver client's completion handler gets a chance to do any
> unlinks so the URBs are not stalled and my driver is broken.  I'm trying
> to work out how to fix it.
> 
>     I'm thinking of doing the following:
> 
>     When there is an error returned by an URB in the back-end, the
> completion handler of the back end will unlink all of the URBs in
> progress for the endpoint of the failing URB and set a 'fail_urb' flag
> for the endpoint to fail back any URBs which arrive subsequently for
> that endpoint until the 'fail_urb' flag is reset.
> 
>     Whenever the front-end completion handler returns from completing an
> URB back to the client and finds that an endpoint has gone idle it will
> set a flag to indicate that the next URB sent to the back-end for that
> endpoint should clear the 'fail_urb' flag in the back end before
> starting.

Failing isn't the right approach.  The back-end should unlink all those 
URBs but keep them available, so that they can be resubmitted if 
necessary.  When the front-end learns about the error, it has the option 
of unlinking the URBs or not -- if it doesn't unlink them then the 
back-end should resubmit them.  Likewise for URBs received from the front 
end before the flag is cleared; they should be kept on the queue so that 
they can be submitted when the front-end's completion routine returns.

>     This strategy is similar to the concept of auto contingent
> allegiance for SCSI.
> 
>     My questions are as follows:
> 
> 1) I think it is impossible to maintain exactly the same guarantee that
> the Linux USB stack provides across the Xen inter-domain interface
> because I don't think it is desirable to attempt the synchronisation of
> the two kernels in different domains required to nest the call of the
> front end completion within the call of the back end completion.  Is
> this correct?

That is not a USB question; it has more to do with the organization of 
Xen.  I don't know enough about Xen to answer fully.  But if this 
inter-kernel synchronization requires any kind of sleeping, then you can't 
use it in the context of a USB completion routine.

> 2) I believe the above strategy to be more conservative than the Linux
> guarantee and sufficient to avoid miscompares due to re-ordering as a
> result of URBs failing whilst there are multiple URBs in flight.  Is
> this correct?

Forget about miscompares; you should be concerned about re-ordering of 
URBs in any context.  The strategy you outlined, with the changes 
suggested above, should suffice to fulfill the Linux USB stack's guarantee 
for a virtual kernel.

> 3) I think the above strategy is compatible with the suggested recovery
> mechanism for Linux USB drivers which is to unlink outstanding URBs in
> the URB completion handler.  Is the above strategy likely to work?

Usually you do want to unlink all the URBs in a queue when one of them 
fails.  But there are times when you don't.  A good example is URBs 
submitted to endpoint 0.  They can arrive independently from separate 
origins (a driver, the USB core, or userspace via usbfs) and failure of 
one shouldn't affect the others.

> 4) I'm wondering what error to fail URBs with when 'fail_urb' is set.
> I'm guessing either -ECONNRESET which is the same as if the URB was
> unlinked or perhaps -EAGAIN.  What would be the correct value?

Do what I said: don't fail the URBs.  Just keep them handy until the 
front-end cancels them itself or clears the flag.  Then there's no need 
for an error code.

> 5) Is there a more USBish way to solve this problem that will fit better
> with the USB infrastructure.
> 
> 6) Any other ideas?

Suspend/resume is liable to cause trouble.  For instance, what happens to 
the various front-ends if the back-end decides to suspend a USB device?

Alan Stern


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.