[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Debugging a weird hardware fault.
On 29/07/11 08:10, Keir Fraser wrote: > On 28/07/2011 23:45, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote: > >> Initially, an SMI was what I was thinking, but the triple fault occurs >> whether >> you start bringing down CPUs or not. While waiting 10 seconds in the >> platform_op select statment, the fault still occurs when all CPUs are still >> up, all IRQs still enabled and potentially domU's still up. (Also, from >> studying the Xen3.4 code, I believe that interrupts are still actually up >> during time_suspend(), but are soon brought down by lapic_suspend() later in >> device_power_down().) >> >> Convertly, in the hacked up case where I ditched most of the shared S3/S5 >> codepath and just hit the PM1A, the server correctly shut down and stayed >> shut >> down, implying that the fault was caused by software (be it BIOS or OS) >> rather >> than hardware. From what I understand of the APCI spec (and I claim very >> little knowledge), there are a multitude of hardware events which could bring >> the server out of S5, appearing as a triple fault, which would not be >> affected >> by whether you had hit the PM1A register. >> >> In this specific example, dom0 regular shudown code already brought down the >> domUs (of which there were none because we never started any), and we were >> running with 1 CPU only so no others were up. This opens up a whole host of >> other possibilities which could be playing an effect betwee the >> XENPF_enter_apci_sleep hypercall and Xen actually shutting itself down. > Well I expect dom0 has done some going-to-sleep work that has left the > platform on borrowed time w.r.t. bashing SLP_EN into the PM1 control > register and actually finalising the shutdown. > > For example, it will have executed the _GTS ACPI method if there is one. > That is supposed to happen immediately before writing PM1.SLP_EN, with no > intervening interrupt activity or I/O. Obviously things don't work out quite > like that when running on Xen! > > This is an architectural limitation of how ACPI sleep is currently > implemented for Xen. It may need some rethinking to do it really properly > according to the spec. e.g., do a hypercall just to prepare Xen for > shutdown, but return back to dom0 in some limited environment to actually > have it do the final ACPI sleep work. Or have dom0 pass a pointer to a code > block that Xen should simply jump at to get the sleep to happen (where that > code block would basically be dom0's acpi_enter_sleep() function). There are > a few, somewhat distasteful, options that are more respectful of the ACPI > spec than we are right now. > > -- Keir Just for information, this turned out to be a BIOS bug. It was setting a 6 second timer when executing _PTS, which hit the system reset if PM1{a,b} had not been hit when the timer expired. As Xen does all of its shutdown after the call to _PTS and before PM1{a,b}, there is a significant time gap, which was falling fowl of the timer in most cases. In this case, it seems likely that a BIOS fix can be done, as Supermicro do provide a custom BIOS for the NetScalar box in question. However, If anyone else comes across this issue, we did make a software solution. You can replace /etc/init.d/halt (or equivalent for your chosen dom0 distro) to KEXEC reboot into a native kernel which listens for a special command line parameter and calls pm_power_off_prepare() and pm_power_off() after the ACPI module has initialized[1]. This issue does however show that Xen itself is in breach of the ACPI spec, which is a dangerous situation to be in given the fragility of APCI at the best of times. In due course, I will put my mind to solving the dom0-Xen ACPI interaction problems if the question is still open. ~Andrew Cooper [1] Yes this is a hack. Sorry. Its the easiest solution without rewriting Xen -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |