|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/MCE: allow overriding the CMCI threshold
On 2015/01/12 9:44, Jan Beulich wrote:
> We've had reports of systems where CMCIs would surface at a relatively
> high rate during certain periods of time, without them apparently
> causing subsequent more severe problems (see Xeon E7-8800/4800/2800
> specification clarification SC1). Give the admin a knob to lower the
> impact on the system logs.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
A small comment at the bottom, besides of that:
Acked-by: Christoph Egger <chegger@xxxxxxxxx>
>
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -242,6 +242,14 @@ the NMI watchdog is also enabled.
>
> If set, override Xen's default choice for the platform timer.
>
> +### cmci-threshold
> +> `= <integer>`
> +
> +> Default: `2`
> +
> +Specify the event count threshold for raising Corrected Machine Check
> +Interrupts. Specifying zero disables CMCI handling.
> +
> ### cmos-rtc-probe
> > `= <boolean>`
>
> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c
> +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
> @@ -492,6 +492,9 @@ static int do_cmci_discover(int i)
> {
> unsigned msr = MSR_IA32_MCx_CTL2(i);
> u64 val;
> + unsigned int threshold, max_threshold;
> + static unsigned int cmci_threshold = 2;
> + integer_param("cmci-threshold", cmci_threshold);
>
> rdmsrl(msr, val);
> /* Some other CPU already owns this bank. */
> @@ -500,15 +503,28 @@ static int do_cmci_discover(int i)
> goto out;
> }
>
> - val &= ~CMCI_THRESHOLD_MASK;
> - wrmsrl(msr, val | CMCI_EN | CMCI_THRESHOLD);
> - rdmsrl(msr, val);
> + if ( cmci_threshold )
> + {
> + wrmsrl(msr, val | CMCI_EN | CMCI_THRESHOLD_MASK);
> + rdmsrl(msr, val);
> + }
>
> if (!(val & CMCI_EN)) {
> /* This bank does not support CMCI. Polling timer has to handle it.
> */
> mcabanks_set(i, __get_cpu_var(no_cmci_banks));
> + wrmsrl(msr, val & ~CMCI_THRESHOLD_MASK);
> return 0;
> }
> + max_threshold = MASK_EXTR(val, CMCI_THRESHOLD_MASK);
> + threshold = cmci_threshold;
> + if ( threshold > max_threshold )
> + {
> + mce_printk(MCE_QUIET,
> + "CMCI: threshold %#x too large for CPU%u bank %u, using
> %#x\n",
> + threshold, smp_processor_id(), i, max_threshold);
> + threshold = max_threshold;
> + }
> + wrmsrl(msr, (val & ~CMCI_THRESHOLD_MASK) | CMCI_EN | threshold);
> mcabanks_set(i, __get_cpu_var(mce_banks_owned));
> out:
> mcabanks_clear(i, __get_cpu_var(no_cmci_banks));
> --- a/xen/arch/x86/cpu/mcheck/x86_mca.h
> +++ b/xen/arch/x86/cpu/mcheck/x86_mca.h
> @@ -86,9 +86,6 @@
> /* Bitfield of MSR_K8_HWCR register */
> #define K8_HWCR_MCi_STATUS_WREN (1ULL << 18)
>
> -/*Intel Specific bitfield*/
> -#define CMCI_THRESHOLD 0x2
> -
> #define MCi_MISC_ADDRMOD_MASK (0x7UL << 6)
> #define MCi_MISC_PHYSMOD (0x2UL << 6)
I think these two are also Intel specific bitfields.
Please leave the comment for those.
Christoph
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |