[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 04/12] x86/mce: handle LMCE locally



>>> On 17.03.17 at 07:46, <haozhong.zhang@xxxxxxxxx> wrote:
> @@ -52,8 +52,8 @@ void mce_barrier_exit(struct mce_softirq_barrier *bar)
>      }
>  }
>  
> -void mce_barrier(struct mce_softirq_barrier *bar)
> +void mce_barrier(struct mce_softirq_barrier *bar, bool nowait)
>  {
> -    mce_barrier_enter(bar);
> -    mce_barrier_exit(bar);
> +    mce_barrier_enter(bar, nowait);
> +    mce_barrier_exit(bar, nowait);
>  }

I don't think this currently unused function needs the extra parameter.
Should a future user find this necessary, it can be done then.

> --- a/xen/arch/x86/cpu/mcheck/mce.c
> +++ b/xen/arch/x86/cpu/mcheck/mce.c
> @@ -42,6 +42,13 @@ DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, 
> poll_bankmask);
>  DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, no_cmci_banks);
>  DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, mce_clear_banks);
>  
> +/*
> + * Flag to indicate that at least one non-local MCE's on this CPU have

... one non-local MCE on this CPU has ...

> + * not been completed handled. It's set by mcheck_cmn_handler() and
> + * cleared by mce_softirq().
> + */
> +DEFINE_PER_CPU(bool, mce_in_process);

static

Also please consider whether mce_in_progress might not be a
better alternative name, and whether the non-local aspect
shouldn't also be expressed by the name chosen.

> @@ -1704,10 +1717,11 @@ static void mce_softirq(void)
>  {
>      int cpu = smp_processor_id();
>      unsigned int workcpu;
> +    bool nowait = !this_cpu(mce_in_process);
>  
>      mce_printk(MCE_VERBOSE, "CPU%d enter softirq\n", cpu);
>  
> -    mce_barrier_enter(&mce_inside_bar);
> +    mce_barrier_enter(&mce_inside_bar, nowait);
>  
>      /*
>       * Everybody is here. Now let's see who gets to do the
> @@ -1720,10 +1734,10 @@ static void mce_softirq(void)
>  
>      atomic_set(&severity_cpu, cpu);
>  
> -    mce_barrier_enter(&mce_severity_bar);
> +    mce_barrier_enter(&mce_severity_bar, nowait);
>      if (!mctelem_has_deferred(cpu))
>          atomic_set(&severity_cpu, cpu);
> -    mce_barrier_exit(&mce_severity_bar);
> +    mce_barrier_exit(&mce_severity_bar, nowait);
>  
>      /* We choose severity_cpu for further processing */
>      if (atomic_read(&severity_cpu) == cpu) {

The logic here looks pretty suspicious even without your changes,
but I think we should try hard to not make it worse. I think you
need to avoid setting severity_cpu in the LMCE case (with,
obviously, further resulting adjustments). And it also needs to be
at least clarified (in the patch description) whether
this_cpu(mce_in_process) changing (namely from 0 to 1) after
you've latched it can't have any bad effect.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.