[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Debugging a weird hardware fault.
Cc'ing some of the Xen ACPI/PM maintainers to see if they have an opinion on this issue... On 29/07/2011 08:10, "Keir Fraser" <keir.xen@xxxxxxxxx> wrote: > On 28/07/2011 23:45, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote: > >> Initially, an SMI was what I was thinking, but the triple fault occurs >> whether >> you start bringing down CPUs or not. While waiting 10 seconds in the >> platform_op select statment, the fault still occurs when all CPUs are still >> up, all IRQs still enabled and potentially domU's still up. (Also, from >> studying the Xen3.4 code, I believe that interrupts are still actually up >> during time_suspend(), but are soon brought down by lapic_suspend() later in >> device_power_down().) >> >> Convertly, in the hacked up case where I ditched most of the shared S3/S5 >> codepath and just hit the PM1A, the server correctly shut down and stayed >> shut >> down, implying that the fault was caused by software (be it BIOS or OS) >> rather >> than hardware. From what I understand of the APCI spec (and I claim very >> little knowledge), there are a multitude of hardware events which could bring >> the server out of S5, appearing as a triple fault, which would not be >> affected >> by whether you had hit the PM1A register. >> >> In this specific example, dom0 regular shudown code already brought down the >> domUs (of which there were none because we never started any), and we were >> running with 1 CPU only so no others were up. This opens up a whole host of >> other possibilities which could be playing an effect betwee the >> XENPF_enter_apci_sleep hypercall and Xen actually shutting itself down. > > Well I expect dom0 has done some going-to-sleep work that has left the > platform on borrowed time w.r.t. bashing SLP_EN into the PM1 control > register and actually finalising the shutdown. > > For example, it will have executed the _GTS ACPI method if there is one. > That is supposed to happen immediately before writing PM1.SLP_EN, with no > intervening interrupt activity or I/O. Obviously things don't work out quite > like that when running on Xen! > > This is an architectural limitation of how ACPI sleep is currently > implemented for Xen. It may need some rethinking to do it really properly > according to the spec. e.g., do a hypercall just to prepare Xen for > shutdown, but return back to dom0 in some limited environment to actually > have it do the final ACPI sleep work. Or have dom0 pass a pointer to a code > block that Xen should simply jump at to get the sleep to happen (where that > code block would basically be dom0's acpi_enter_sleep() function). There are > a few, somewhat distasteful, options that are more respectful of the ACPI > spec than we are right now. > > -- Keir > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |