[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Avoid triggering the softlockup BUG when offline for too long.



On Mon, Nov 27, 2006 at 10:21:54AM +0000, Keir Fraser wrote:
> 
> 
> 
> On 24/11/06 13:10, "Glauber de Oliveira Costa" <gcosta@xxxxxxxxxx> wrote:
> 
> > After being offline for a long time, the softlockup  watchdog triggers
> > a BUG() on our faces. This is expected, as in fact, we spent more than
> > a fixed 10*HZ amount of time without touching the watchdog.
> > 
> > However, by inspecting the contents of RUNSTATE_offline, we can gain
> > awareness of the fact, and do better than that. This patch fixes it.
> > 
> > Signed-off-by: Glauber de Oliveira Costa <gcosta@xxxxxxxxxx>
> 
> Would 'stolen' not be a good enough thing to test? Presumably this is really
> just dealing with xm pause/unpause (a single long offline) so this simpler
> fix would work just as well?

I thought about it, but I'm not 100 % sure. Reasons I had for not using
stolen, were basically:

* Conceptually, (maybe not in practice) stolen could grow due to
runnable time only. 
* stolen time, as well as blocked time, does not have it's corresponding
per processor variable updated all in once, but in multiples of
NS_PER_TICK chuncks. If we're out for too long, we could detect stolen
being too great multiple times, leading to far more calls to the
softlockup watchdog then we want too.

Waiting for your comments on this,

-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.