[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b)

On Wed, Jul 29, 2015 at 12:52:37PM +0200, Roger Pau Monné wrote:
> El 29/07/15 a les 11.03, Ian Campbell ha escrit:
> > On Tue, 2015-07-28 at 15:47 -0400, Konrad Rzeszutek Wilk wrote:
> >> Hey,
> >>
> >> I launch a bunch of guests at the same time or in parallel and 
> >> the scripts end up timing out with:
> > 
> > Are you sure you have cleaned out all the old udev .rules files? If any of
> > those are still present then you will get both sets competing to drive
> > things and they will conflict and cause this sort of breakage.

This is what I have in udev without the revert.

-bash-4.1# find /etc/udev/

> > 
> > Perhaps we should put back the hacks which nobble the udev case for another
> > release? i.e. the thing which writes the path (but unconditional in
> > xencommons) and the bit in the hotplug scripts which gates on it, but still
> > remove the .rules files. That's only delaying the inevitable though, since
> > upgrades to 4.7 will have the same issue.
> > 
> > Perhaps in the scripts themselves:
> > 
> > if [ -n "${UDEV_CALL}" ] ; then
> >     error "called through udev, please remove stale udev rules files"
> > fi
> > 
> > relying on the (stale) 4.5 rules file having the UDEV_CALL=1 in them.

I don't exactly understand how the hotplug scripts are invoked via 'xl'.
With udev it was pretty clear and easy to me.

And the invocation of the scripts were driven by the backend - that is 
xen-blkback advertising whenever it was ready - and then udev rules
triggering the scripts.

In my case I seem to have xen-blkback taking its time and coming up with
the disk much much later - and the block scripts have already run
and timed out.

We could add some code in the xl (where it executes the hotplug scripts)
to monitor for kernel udev events from the backend - and only then
execute the hotplug scripts. But that is a bit of custom code to deal
with xen-blkback, xen-netback, xen-pciback, etc.

And that would still neccessity an udev rules to funnel them to the 'xl devd'
daemon which would understand udev format and such (via a socket).

Note that I see this problem regardless of me having 'xl devd' running or not.
> Another option would be to install an empty xen-backend.rules for the
> 4.6 release, and then remove it for 4.7.

Or trim down the udev rules ?

> I've also been able to trigger this by using a similar loop. AFAICT the
> hotplug scripts are running correctly, the problem seems to be that the
> check_sharing function that's executed to check *every* loop device that
> points to the same file is scanning xenstore in order to find if the
> loop device is also used by another guest. When 20 guests are launched
> in parallel, the CPU consumption in Dom0 is quite high because of all
> the Qemu processes, and the xenstore daemon is basically starving to get
> some CPU time.
> IMHO, we should remove this checks and allow the users to shoot on their
> feet if they want to, and in fact that's what I did on FreeBSD.
> What I still don't understand is why this only triggers with 2ba368
> applied. My best guess is that you still have a stale xen-backend.rules
> file so you are actually calling the hotplug scripts twice, creating x2
> loop devices for each guest, which of course also slows down things even
> more.

Sadly no. No stable xen-backend.rules - this was a fresh install.

> Roger.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.