[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] should vcpu_pause()/vcpu_sleep_nosync() give up?
On Sep 6, 2006, at 12:02 PM, Keir Fraser wrote: On 6/9/06 2:18 pm, "Jimi Xenidis" <jimix@xxxxxxxxxxxxxx> wrote:First off, I realize I have an SMP bug where my second processor is hung somewhere, I'm not sure where, but for the sake of this argument lets assume it has suffered an unrecoverable fault. My primary CPU is fine and is hung in vcpu_sleep_nosync() because the secondary will not clear its _VCPUF_running bit.ITYM vcpu_sleep_sync(). Hint is in the name. ;-) The nosync variant does notspin on the _running flag. Correct. While I have this error I would like to give up and try and recover from it. How long is long enuff? thoughts?Holy crap! I find these things to be rather UN-holy :) Are you assuming that the offline CPU was not running anything other than the idle loop or guest code, and that you'll simply destroy the guest if onewas running (since you can't really continue it)? Not sure how far I'd go here, but right now, I'd be happy with one CPU not causing all CPUs (or the one servicing a xend command) to sit in an infinite loop, even if its my fault. Given that this is a software bug, and there is always at least one :) these assumptions are likely not true and the CPU has gone down taking some locks with it. Hypervisors should increase the availability of the machine as a whole, PPC machines tend to have many HA features that when unhandled (mostly ECC) can cause a CPU to go down. However, being optimistic, I suppose a few 100ms would be plenty to know that something is probably up. ok.. I'll work with that, thanks-JX _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |