[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] XENBUS: Timeout connecting to device errors
On Mon, Dec 04, 2006 at 02:18:37PM -0500, Graham, Simon wrote: > We've been noticing a lot of these errors when booting VMs since we > moved to 3.0.3 - I've traced this to the hotplug scripts in Dom0 taking > >10s to run to completion and specifically the vif-bridge script taking > >=9s to plug the vif into the s/w bridge on occasion - was wondering if > anyone has any insight into why it might take this long. > > I added some instrumentation to the scripts to log entry/exit from > xen-backend.agent and also lock contention (attached at the end of this) > and have the following observations: > > 1. Currently, the various script invocations are issued in parallel but > are serialized > by a single global lock -- is it really necessary, for example, to > serialize vif > and vbd hot plug processing in Dom0? You need to serialise VBD hotplug if you are going to get the right result when performing the sharing check. If you're using vif-nat, you need to serialise the modifications to the DHCP configuration file. Other than that, I don't think that there's a need to serialise events at startup. On Bugzilla #515, Harry Butterworth notes that there is a race condition in teardown, which is why he introduced the global lock. You could make this cleverer, possibly, so that it doesn't affect startup times. All that said, I believe that udev is supposed to serialise all events anyway, so unless you're using hotplug rather than udev, I'd expect you to see no lock contention whatsoever. > 2. In most cases we've seen, this problem happens when the first VM is > started after > re-installing a box. In the example below, the 'vif online' > processing started at > 2:21:53 and did not finish until 2:22:04 > > 3. Clearly a hard coded timeout of 10s is less than perfect -- is there > no better way of knowing > when the hotplug processing is done? We know precisely when hotplugging is done -- the scripts write an entry into the store to tell us so. It's knowing when they've locked up that's the hard bit. If you're seeing vif bringup taking 9 seconds, then clearly the 10 second timeout is far too short. There's no particular reason to keep the timeout short, so feel free to lengthen it, with the obvious consequences. Bear in mind that Xend will time out the whole device bringup phase after 100 seconds. I'd want to root-cause the 9 second bringup as well, as I don't see why it ought to take that long. Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |