Xen project Mailing List

Re: [Xen-devel] [PATCH v2 04/12] x86/mce: handle LMCE locally

To: "Haozhong Zhang" <haozhong.zhang@xxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 20 Mar 2017 08:24:49 -0600

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Mon, 20 Mar 2017 14:24:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 17.03.17 at 07:46, <haozhong.zhang@xxxxxxxxx> wrote: > @@ -52,8 +52,8 @@ void mce_barrier_exit(struct mce_softirq_barrier *bar) > } > } > > -void mce_barrier(struct mce_softirq_barrier *bar) > +void mce_barrier(struct mce_softirq_barrier *bar, bool nowait) > { > - mce_barrier_enter(bar); > - mce_barrier_exit(bar); > + mce_barrier_enter(bar, nowait); > + mce_barrier_exit(bar, nowait); > } I don't think this currently unused function needs the extra parameter. Should a future user find this necessary, it can be done then. > --- a/xen/arch/x86/cpu/mcheck/mce.c > +++ b/xen/arch/x86/cpu/mcheck/mce.c > @@ -42,6 +42,13 @@ DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, > poll_bankmask); > DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, no_cmci_banks); > DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, mce_clear_banks); > > +/* > + * Flag to indicate that at least one non-local MCE's on this CPU have ... one non-local MCE on this CPU has ... > + * not been completed handled. It's set by mcheck_cmn_handler() and > + * cleared by mce_softirq(). > + */ > +DEFINE_PER_CPU(bool, mce_in_process); static Also please consider whether mce_in_progress might not be a better alternative name, and whether the non-local aspect shouldn't also be expressed by the name chosen. > @@ -1704,10 +1717,11 @@ static void mce_softirq(void) > { > int cpu = smp_processor_id(); > unsigned int workcpu; > + bool nowait = !this_cpu(mce_in_process); > > mce_printk(MCE_VERBOSE, "CPU%d enter softirq\n", cpu); > > - mce_barrier_enter(&mce_inside_bar); > + mce_barrier_enter(&mce_inside_bar, nowait); > > /* > * Everybody is here. Now let's see who gets to do the > @@ -1720,10 +1734,10 @@ static void mce_softirq(void) > > atomic_set(&severity_cpu, cpu); > > - mce_barrier_enter(&mce_severity_bar); > + mce_barrier_enter(&mce_severity_bar, nowait); > if (!mctelem_has_deferred(cpu)) > atomic_set(&severity_cpu, cpu); > - mce_barrier_exit(&mce_severity_bar); > + mce_barrier_exit(&mce_severity_bar, nowait); > > /* We choose severity_cpu for further processing */ > if (atomic_read(&severity_cpu) == cpu) { The logic here looks pretty suspicious even without your changes, but I think we should try hard to not make it worse. I think you need to avoid setting severity_cpu in the LMCE case (with, obviously, further resulting adjustments). And it also needs to be at least clarified (in the patch description) whether this_cpu(mce_in_process) changing (namely from 0 to 1) after you've latched it can't have any bad effect. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.