[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] libxl / xen-pciback interaction
All, while trying to test a pciback fix for how SR-IOV VFs get placed in the guest PCI topology I noticed that my test VM would only ever see the 1st out of 3 VFs that I passed to it. As it turns out, the driver watches its own (backend) node, and upon first receiving notification it evaluates state found in xenstore to set up the backend device. Subsequently it switches the device to Initialised. After this switching, not further instances of the watch triggering would do anything. In all instances I observed the watch event getting processed when the "num_devs" node still reported 1. Trying to deal with this in libxl, by delaying the writing of the "num_devs" node, led to a fatal error ("num_devs" not being available to read) in the driver, causing the device to move to Closing state. Therefore I decided that the issue has to be addressed in the driver, resulting in a patch (reproduced below) that I'm not overly happy with. I think the present libxl behavior is wrong - it shouldn't trigger driver initialization without having fully populated the information the driver is supposed to consume for its device initialization. The only solution that I can think of, however, doesn't look very appealing either: Instead of putting all pieces of the data for one device in a transaction, make a single transaction cover all devices collectively. Are there opinions about where to address the issue, or suggestions as to better approaches than the ones shown / outlined? While doing this it also occurred to me as odd how "num_devs" is actually used: It's not really a "number of devices" indicator, but instead a "next device has this number" one: libxl reads the present value and increments it by one for every new device. Destroying (hot-unplugging) of devices doesn't have any effect on the value. If addition / removal of a device happens a number of times for a VM, quite a few leftover, no longer used entries would accumulate afaict. This isn't only consuming space in xenstore for no good reason, but also means pciback has to do an increasing amount of processing every time a reconfigure event happens. Jan xen-pciback: reconfigure also from backend watch handler When multiple PCI devices get assigned to a guest right at boot, libxl incrementally populates the backend tree. The writes for the first of the devices trigger the backend watch. In turn xen_pcibk_setup_backend() will set the XenBus state to Initialised, at which point no further reconfigures would happen unless a device got hotplugged. Arrange for reconfigure to also get triggered from the backend watch handler. Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx --- I will admit that this isn't entirely race-free (with the guest actually attaching in parallel), but from the looks of it such a race ought to be benign (not the least thanks to the mutex). Ideally the tool stack wouldn't write num_devs until all devices had their information populated. I tried doing so in libxl, but xen_pcibk_setup_backend() calling xenbus_dev_fatal() when not being able to read that node prohibits such an approach (or else libxl and driver changes would need to go into use in lock-step). I wonder why what is being watched isn't just the num_devs node. Right now the watch triggers quite frequently without anything relevant actually having changed (I suppose in at least some cases in response to writes by the backend itself). --- a/drivers/xen/xen-pciback/xenbus.c +++ b/drivers/xen/xen-pciback/xenbus.c @@ -359,7 +359,8 @@ out: return err; } -static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev) +static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev, + enum xenbus_state state) { int err = 0; int num_devs; @@ -374,8 +375,7 @@ static int xen_pcibk_reconfigure(struct mutex_lock(&pdev->dev_lock); /* Make sure we only reconfigure once */ - if (xenbus_read_driver_state(pdev->xdev->nodename) != - XenbusStateReconfiguring) + if (xenbus_read_driver_state(pdev->xdev->nodename) != state) goto out; err = xenbus_scanf(XBT_NIL, pdev->xdev->nodename, "num_devs", "%d", @@ -500,6 +500,9 @@ static int xen_pcibk_reconfigure(struct } } + if (state != XenbusStateReconfiguring) + goto out; + err = xenbus_switch_state(pdev->xdev, XenbusStateReconfigured); if (err) { xenbus_dev_fatal(pdev->xdev, err, @@ -525,7 +528,7 @@ static void xen_pcibk_frontend_changed(s break; case XenbusStateReconfiguring: - xen_pcibk_reconfigure(pdev); + xen_pcibk_reconfigure(pdev, XenbusStateReconfiguring); break; case XenbusStateConnected: @@ -664,6 +667,10 @@ static void xen_pcibk_be_watch(struct xe xen_pcibk_setup_backend(pdev); break; + case XenbusStateInitialised: + xen_pcibk_reconfigure(pdev, XenbusStateInitialised); + break; + default: break; }
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |