[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V13 3/5] libxl: add pvusb API



On 04/02/16 14:39, Juergen Gross wrote:
> On 04/02/16 02:53, Chun Yan Liu wrote:
>>
>>
>>>>> On 2/3/2016 at 10:33 PM, in message <56B20FCC.3010308@xxxxxxxxxx>, George
>> Dunlap <george.dunlap@xxxxxxxxxx> wrote: 
>>> On 02/02/16 18:11, Ian Jackson wrote: 
>>>> George Dunlap writes ("Re: [Xen-devel] [PATCH V13 3/5] libxl: add pvusb  
>>> API"): 
>>>>> There are effectively four states a device can be in, from the 
>>>>> 'assignment' point of view: 
>>>>>
>>>>> 1. Assigned to the normal Linux device driver(s) for the USB device 
>>>>>
>>>>> 2. Assigned to no driver 
>>>>>
>>>>> 3. Assigned to usbback, but not yet assigned to any guest 
>>>>>
>>>>> 4. Assigned to a guest 
>>>>  
>>>> Thanks for your clear explanation (of which I will snip much). 
>>>>  
>>>>> Additionally, each USB "device" has one or more "interfaces".  To 
>>>>> assign a "device" to usbback in the sysfs case means assigning all of 
>>>>> the "interfaces".  The code seems to assume that different interfaces 
>>>>> from the same device can be assigned to different drivers. 
>>>>  
>>>> It is indeed the case that in principle a single USB device with 
>>>> multiple interfaces can be assigned to multiple different drivers. 
>>>>  
>>>>> Regarding Ian's comments: 
>>>>>
>>>>> Since "assigned to the guest" and "listed under the pvusb xenstore 
>>>>> node" are the same thing, is it even *possible* to (safely) unassign 
>>>>> the device from usbback before removing the xenstore nodes? 
>>>>  
>>>> It might be possible to remove some of the xenstore nodes but leave 
>>>> others present, so that usbback detaches, but enough information 
>>>> remains for libxl to know that Xen still `owns' the device. 
>>>>  
>>>> But, surely usbback needs to cope with the notion that the device 
>>>> might disappear.  USB devices can disappear at any time. 
>>>  
>>> That's true. But if the USB device has actually disappeared, the device 
>>> is in fact "safe" from usbback.  I think I might consider state 2 safe 
>>> to go to, but I definitely wouldn't consider state 1 safe with the 
>>> xenstore nodes still in place. 
>>>  
>>>>  
>>>>> It's not clear to me under what conditions 3->2 might fail, 
>>>>  
>>>> As an example, someone might press ^C on `xl usb-detach'. 
>>>  
>>> *grumble* 
>>>  
>>>>> or what could be done about it if it did fail. 
>>>>  
>>>> The most obvious reason for it failing is that something somewhere 
>>>> still held onto the device.  (For umounting filesystems, and detaching 
>>>> block devices, this happens a lot.)  So if the detach from usbback 
>>>> fails would probably be possible to simply retry it. 
>>>  
>>> I'm not sure what this means in the context of moving from 3 (assigned 
>>> to usbback) to 2 (unassigned).  usbback will definitely not use the 
>>> device to mount something in dom0; and I'm pretty sure has no visibility 
>>> as to whether it's being used as a filesystem in the domU. 
>>>  
>>> (Moving from 1 -> 2 of course would be subject to this sort of thing, 
>>> but that's a different issue.) 
>>>  
>>>>> Regarding 2->1, again it's not clear that there's anything libxl can 
>>>>> do.  Reloading the driver's module might reset it enough to pick up 
>>>>> the (now unassigned) USB devices again; other than that, rebooting is 
>>>>> probably the best option. 
>>>>  
>>>> I think re-attempting the bind may work.  USB devices can be flaky. 
>>>>  
>>>>> It's also not clear to me, if some functions succeed in being 
>>>>> reassigned and others fail, whether it's best to just try to assign as 
>>>>> many as we can, or better to go back and un-assign all the ones that 
>>>>> succeeded. 
>>>>  
>>>> Unless explicitly requested, I don't think the system should create 
>>>> situations some interfaces are assigned to host drivers and some to 
>>>> usbback. 
>>>  
>>> I was talking here about whether it would be better to have some 
>>> interfaces assigned to the original drivers and some interfaces left 
>>> unassigned, or to try to leave all interfaces unassigned if any of the 
>>> assignments fail. 
>>>  
>>> I agree, it would be better to try to avoid the possibility of having 
>>> some interfaces bound to usbback and some interfaces bound to the 
>>> original drivers. 
>>>  
>>>> And, I'm a fan of `crash-only software': in general, if a failure 
>>>> occurs, the situation should just be left as-is.  The intermediate 
>>>> state needs to be visible and rectifiable. 
>>>>  
>>>> This approach to software state machines avoids bugs where the system 
>>>> gets into `impossible' states, recovery from which is impossible 
>>>> because the designers didn't anticipate them. 
>>>>  
>>>> It would be tolerable if the recovery sometimes involves `lsusb' and 
>>>> echoing things into sysfs, but it would be better if not. 
>>>  
>>> Right -- so what about this.  When removing a USB device: 
>>>  
>>> * First modify the pvusb xenstore nodes such that 1) it's safe to 
>>> attempt removing the interfaces from usbback, but 2) they still show up 
>>> in usb-list.  (This may be a noop.) 
>>>  
>>> * Attempt to remove all interfaces from usbback; if any of these fail, 
>>> stop and report an error.  Possible recovery options: 
>>>  - Re-try the usb_remove command 
>>>  - Re-load usbback (obviously disruptive to other VMs) 
>>>  - Reboot the host 
>>>  - Manually try unbinding with sysfs 
>>>  
>>> * Remove the remaining pvusb xenstore nodes.  If this fails, stop and 
>>> report the error.  Possible recovery options: 
>>>  - Re-try the usb_remove command 
>>>  
>>> = REBIND OPTION 1: Do the best we can without extra commands 
>>>  
>>> * Attempt to re-bind the interfaces to the original drivers, as recorded 
>>> in the libxl usb xenstore nodes.  If any of these fail, report an error 
>>> but continue to try the rest of the interfaces, and remove the xenstore 
>>> nodes containing information about the original drivers.  Possible 
>>> recovery options for the user: 
>>>  - Reload the original drivers 
>>>  - Reboot the host 
>>>  - Manually try rebinding with sysfs 
>>>  
>>> = REBIND OPTION 2: Include a recovery command, usb-retry-rebind 
>>>  
>>> * Attempt to re-bind the interfaces to the original drivers, as recorded 
>>> in the libxl usb xenstore nodes.  If any of these fail, stop immediately 
>>> and report an error; do not remove the xenstore nodes containing the 
>>> original drivers of any interfaces that failed to rebind.  Possible 
>>> recovery options for the user: 
>>>  - Run 'xl usbdev-retry-rebind', which will just execute this step again 
>>>  - Unload and reload original host drivers 
>>>  - Reboot the host 
>>>  
>>> Both of these: 
>>>  
>>> 1. Will avoid the state of some interfaces bound to usbback, some 
>>> rebound to the original drivers 
>>>  
>>> 2. Give the user a convenient way to re-try unbinding from usbback it 
>>> failed 
>>>  
>>> 3. Give the user out-of-xl ways to attempt to recover the state other 
>>> than messing around with sysfs. 
>>>  
>>> Rebind option 2 will give the user a convenient way to retry rebinding 
>>> to the original driver via xl if that step failed. 
>>>  
>>> I'm inclined to suggest rebind option #1 for now, just to keep things 
>>> simple. 
>>>  
>>> Thoughts? 
>>>  
>>> Chunyan, would the first half of that (removing from usbback before 
>>> removing the pvusb xenstore nodes) actually work? 
>>
>> From testing, yes, it works. But I am not sure if it is safe though. Juergen
>> should be very clear about that according to usbback internal codes.
>>
>> Juergen, could you help to confirm that?
> 
> I think it should work.
> 
> TBH, I don't think the error handling of driver binding/unbinding should
> be made too intelligent. In the long run I hope to have a qemu-based
> pvUSB backend. And in this case driver binding/unbinding will be handled
> completely by qemu/libusb.
> 
> Whether we want to keep the kernel based pvUSB-backend support in xl
> at the end should be discussed later. I'll either add the qemu based
> variant or I'll replace the kernel variant with the qemu one as soon
> as the qemu backend has been accepted.

There's a difference between "making it intelligent" and "not making it
broken". :-)  Given that users can potentially cause a number of these
things to fail just by pressing Ctrl-C, we need to at least make sure
that we don't get into a completely wedged state that the user can't do
anything to fix.  That requires some careful thought.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.