[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Regression, host crash with 4.5rc1
>>> On 23.11.14 at 02:28, <sflist@xxxxxxxxx> wrote: > With mwait-idle=0: > > (XEN) 'c' pressed -> printing ACPI Cx structures > (XEN) ==cpu0== > (XEN) active state: C0 > (XEN) max_cstate: C7 > (XEN) states: > (XEN) C1: type[C1] latency[001] usage[00000000] method[ FFH] > duration[0] > (XEN) C2: type[C0] latency[000] usage[00000000] method[ NONE] > duration[0] > (XEN) C3: type[C3] latency[064] usage[00000000] method[ FFH] > duration[0] > (XEN) C4: type[C3] latency[096] usage[00000000] method[ FFH] > duration[0] > (XEN) *C0: usage[00000000] duration[46930624784] > (XEN) PC2[0] PC3[0] PC6[0] PC7[0] > (XEN) CC3[0] CC6[0] CC7[0] >[...] Very interesting - the hypervisor has C-state information, but never entered any of them. That certainly explains the difference between using/not using the ,wait-idle driver, but puts us back to there being a more general issue with C-state use on this CPU model. Possibly related to C2 having entry method "NONE", but then again I can't see how such a state could get entered into the table the first place: set_cx() bails upon check_cx() returning an error, and hence its switch()'s default statement should never be reached. Plus even if an array entry was set to "NONE", it should simply be ignored when looking for a state to enter. I'll probably need to put together a debugging patch to figure out what's going on here. In any event C2 being set to "NONE" and that information presumably coming from firmware is an indication that there's a problem with C2 (note that the numbering doesn't really match up with what the document says, this likely really is C1E) on that CPU. Which gets us back to ... > CPU information for one of the cores, 2.8 GHz is nominal, stepping is 2. > Not sure how to translate that stepping number into Intel's format: > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 44 > model name : Intel(R) Xeon(R) CPU X5660 @ 2.80GHz > stepping : 2 >[...] >> There are a couple potentially relevant errata (BC36, BC38, BC54, >> BC77, BC110). >> >> To exclude BC36, a boot log with "apic-verbosity=debug" and debug >> key 'i' output would be necessary. > > Done, see the very end of the email. > >> BC38 should not affect us since we don't enter C states from ISRs. >> >> BC54 is probably irrelevant since we meanwhile know that your >> system doesn't really hang hard. >> >> For BC77 it would be worth trying to disable turbo mode instead of >> disabling the mwait-idle driver ("xenpm disable-turbo-mode" right >> after boot). > > I looked up BC77 but as a result found this document[1], which seems to > relate to the i7. Would this[2] not be the relevant document? > > [1] > http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd > ates/core-i7-900-ee-and-desktop-processor-series-32nm-spec-update.pdf > > [2] > http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd > ates/xeon-5600-specification-update.pdf Indeed. I wasn't aware that there are family/model/stepping tuples that can be both Xeon and desktop CPUs. > As promised, below is the apic-verbosity=debug log, with 'i'. Thanks! I'm sorry, I misspelled the option, it's really "apic_verbosity=debug". The 'i' output at least already confirms that there are no ExtINT entries among the IO-APIC RTEs. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |