[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] Regression, host crash with 4.5rc1
Hi Len, thanks for chiming in. I am a
Xen noob and generally clueless to the inner workings of this
power management stuff, so apologies in advance if I don't
understand what is asked. I am, however, happy to try whatever
you'd like me to in pursuing this issue.
On 03/02/2015 07:24 AM, Jan Beulich wrote:
On 27.02.15 at 18:50, <len.brown@xxxxxxxxx> wrote:
If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
Hopefully some of this will translate to Xen in one way or another.
Sadly not really - the kernel plays only a minor role (forwarding ACPI
data to the hypervisor) in C-state handling under Xen.
dmesg | grep idle
will tell us what idle driver is running (on Dom0 kernel)
and if it is intel_idle, it will also tell us the supported sub-states
(CPUID.MWAIT.EDX value)
root@g2:~# dmesg | grep idle
[ 0.000000] RCU dyntick-idle grace-period acceleration is
enabled.
[ 11.391708] intel_idle: MWAIT substates: 0x1120
[ 11.391711] intel_idle: v0.4 model 0x2C
[ 11.391712] intel_idle: lapic_timer_reliable_states 0xffffffff
[ 11.391780] intel_idle: intel_idle yielding to none
(This output is the same whether I've got max_cstate=2 set or not.)
Yeah, we call the driver mwait-idle in the hypervisor, and the log
would be accssible via "xl dmesg", but yes, that information is
available there too.
(XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH]
duration[1190961948551]
(XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH]
duration[2015393965907]
(XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH]
duration[30527997858148]
I'm hopeful that this information comes from the hardware's BIOS
and not some hypervisor tricking out Dom0 with a fake BIOS, yes?
In the case of mwait-idle (intel_idle on Linux) it would be built-in
knowledge of the driver. For acpi-cpuidle it would come from
actual firmware, not anything fake/virtual.
Next, hopefully the attached turbostat utility can be invoked on Dom0
and it can read the MSRs on at least 1 processor via the /dev/cpu interface.
Yes, that would be possible, provided it's not important what specific
CPU it gets executed on.
I've run it (with the "max_cstate=2" intact from Xen's boot line)
and the output is as follows, while running the problematic graphics
benchmark on my Win 7 VM:
root@g2:~/turbostat-test# ./turbostat
./turbostat: APERF or MPERF went backwards *
* Frequency results do not cover entire interval *
* fix this by running Linux-2.6.30 or later *
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 36804******** 2736 2800
0 64323******** 2560 2800
1 8244******** 3398 2800
2 125758******** 2760 2800
3 17811******** 3032 2800
4 735******** 2977 2800
5 3954******** 2656 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 47728******** 2804 2800
0 18007******** 3025 2800
1 69086******** 2634 2800
2 522******** 2713 2800
3 77486******** 2680 2800
4 58487******** 2932 2800
5 62777******** 3006 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 49031******** 2728 2800
0 78178******** 2681 2800
1 62045******** 2561 2800
2 9060******** 3110 2800
3 16619******** 3255 2800
4 720******** 2661 2800
5 127565******** 2763 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 65471******** 2700 2800
0 70582******** 2638 2800
1 2173******** 1954 2800
2 49981******** 2899 2800
3 78668******** 2682 2800
4 128293******** 2762 2800
5 63131******** 2566 2800
Not sure why the warning about the kernel version, this box is
running Debian's Linux 3.16 kernel.
With "max_cstate=2" removed from Xen's boot line, this is the
result:
root@g2:~/turbostat-test# ./turbostat
./turbostat: APERF or MPERF went backwards *
* Frequency results do not cover entire interval *
* fix this by running Linux-2.6.30 or later *
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 23507******** 2621 2800
0 27631******** 2552 2800
1 35945******** 2978 2800
2 24417******** 2472 2800
3 1001******** 2948 2800
4 24417******** 2472 2800
5 27631******** 2552 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 14114******** 2687 2800
0 529******** 2738 2800
1 60363******** 2750 2800
2 21290******** 2497 2800
3 1028******** 2934 2800
4 629******** 2943 2800
5 842******** 2937 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 15048******** 2714 2800
0 25703******** 2489 2800
1 36024******** 2975 2800
2 5248******** 2454 2800
3 5248******** 2454 2800
4 9138******** 2755 2800
5 8925******** 2751 2800
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 32859******** 2598 2800
0 23089******** 2526 2800
1 61730******** 2751 2800
2 26138******** 2492 2800
3 26138******** 2492 2800
4 30029******** 2574 2800
5 30029******** 2574 2800
It may tell us just the same thing I think we learned here:
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
(XEN) CC3[28794734145697] CC6[0] CC7[0]
which I'm assuming are a dump of the MSR residency counters.
If yes, it appears to be that this platform is not invoking c6 and pc6 at
all,
and that the deepest state being used is actually cc3 and pc3.
I don't know if that is because you've booted the kernel with max_cstate=N
of some kind, or if this is default.
Sadly I haven't been able to tell which original mail the quotes
above are from, and since I had Steve experiment with disabling
the deepest C-state permitted to be used, it may well be that
this output was from one of those experiments. Remember, we
already know that with use of C6 alone disabled things work for
him (Steve - please correct me if I'm misremembering).
AIUI, that is correct. My Xen boot line (which eliminates the dom0/U
hangs) includes "mwait-idle=1 max_cstate=2".
Guessing...
If no surprises in the debug stuff requested above, and
If the XEN debug stuff above is with c6 explicitly disabled...
Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
If this box supports both, the next thing to try will be to keep CC6
enabled, but to just disable PC6. This is done via an MSR that turbostat
dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.
I don't think the wrmsr tool can be used (unmodified) to reliably do
this on all CPUs in the system - we'd likely have to cook up a patch
to the hypervisor instead, or I'd have to hand my patch to msr-tools
to Steve so he could use the tool under Xen (albeit that would also
require him to use one of our forward ported kernels, as the
upstream one doesn't have a pCPU sysfs interface yet afaik).
I'm game for whatever.
Though if that MSR is locked by the BIOS, then BIOS SETUP option
may be the only way to disable the package C-state limit without
also disabling the associated core C-state.
Steve, could you check whether any such option exists (it's been
a while, so apologies if we had asked already)?
No problem. I've cruised through the BIOS options and this is what I
see that may apply:
If you'd like me to make any changes to those
settings, please let me know. For reference this is a Lenovo
ThinkStation D20 running a Xeon X5660.
Thanks!
Steve
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|