[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers



>>> On 05.12.16 at 11:05, <andrew.cooper3@xxxxxxxxxx> wrote:
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int 
> ecx)
>  
>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>      {
> -        mb();
> +        smp_mb();
>          clflush((void *)&mwait_wakeup(cpu));
> -        mb();
> +        smp_mb();
>      }

Both need to stay as they are afaict: In order for the clflush() to do
what we want we have to order it wrt earlier as well as later writes,
regardless of SMP-ness. Or wait - the SDM has changed in that
respect (and a footnote describes the earlier specified behavior now).
AMD, otoh, continues to require MFENCE for ordering purposes.

> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data)
>                       /* Counter enable */
>                       value |= (1ULL << 51);
>                       mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
> -                     wmb();
> +                     smp_wmb();
>               }
>       }
>  
> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>                       value |= (1ULL << 51);
>                       wrmsrl(MSR_IA32_MCx_MISC(4), value);
>                       /* serialize */
> -                     wmb();
> +                     smp_wmb();
>                       printk(XENLOG_INFO "MCA: Use hw thresholding to adjust 
> polling frequency\n");
>               }
>       }

These will need confirming by AMD engineers.

> --- a/xen/arch/x86/cpu/mcheck/barrier.c
> +++ b/xen/arch/x86/cpu/mcheck/barrier.c
> @@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar)
>  void mce_barrier_dec(struct mce_softirq_barrier *bar)
>  {
>      atomic_inc(&bar->outgen);
> -    wmb();
> +    smp_wmb();
>      atomic_dec(&bar->val);
>  }

This being x86-specific code - do we need a barrier here at all,
between two LOCKed instructions? (Same for two more cases further
down.)

> @@ -433,7 +433,7 @@ mctelem_cookie_t 
> mctelem_consume_oldest_begin(mctelem_class_t which)
>       }
>  
>       mctelem_processing_hold(tep);
> -     wmb();
> +     smp_wmb();
>       spin_unlock(&processing_lock);

Don't we imply unlocks to be write barriers?

> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void)
>      unsigned long t1, flags;
>  
>      t1 = pit0_ticks;
> -    mb();
> +    smp_mb();
>  
>      local_save_flags(flags);
>      local_irq_enable();
> @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void)
>       * might have cached one ExtINT interrupt.  Finally, at
>       * least one tick may be lost due to delays.
>       */
> -    mb();
> +    smp_mb();
>      if (pit0_ticks - t1 > 4)
>          return 1;

I think these two can be barrier().

> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave)
>      for ( i = 1; i <= 5; i++ )
>      {
>          tsc_value = rdtsc_ordered();
> -        wmb();
> +        smp_wmb();
>          atomic_inc(&tsc_count);

Same question as above wrt the following LOCKed instruction.

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu 
> *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
> +        smp_rmb();
>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>                                        IOMMU_EVENT_CODE_SHIFT);
>      }
> @@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 
> entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
> +        smp_rmb();
>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>      }

Don't these fall into the "reduced-cacheability memory mappings"
category? Or if the mappings these reads go through are UC,
aren't they unneeded altogether? In any event this would better be
a separate patch Cc-ed to the AMD IOMMU maintainers.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.