[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode



On 26.09.2019 15:53, Chao Gao wrote:
> @@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
>  }
>  
>  /*
> - * The format is '[<integer>|scan]'. Both options are optional.
> + * The format is '[<integer>|scan, nmi=<bool>]'. Both options are optional.
>   * If the EFI has forced which of the multiboot payloads is to be used,
> - * no parsing will be attempted.
> + * only nmi=<bool> is parsed.
>   */
>  static int __init parse_ucode(const char *s)
>  {
> -    const char *q = NULL;
> +    const char *ss;
> +    int val, rc = 0;
>  
> -    if ( ucode_mod_forced ) /* Forced by EFI */
> -       return 0;
> +    do {
> +        ss = strchr(s, ',');
> +        if ( !ss )
> +            ss = strchr(s, '\0');
>  
> -    if ( !strncmp(s, "scan", 4) )
> -        ucode_scan = 1;
> -    else
> -        ucode_mod_idx = simple_strtol(s, &q, 0);
> +        if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
> +            ucode_in_nmi = val;
> +        else if ( !ucode_mod_forced ) /* Not forced by EFI */
> +        {
> +            const char *q = NULL;
> +
> +            if ( !strncmp(s, "scan", 4) )
> +            {
> +                ucode_scan = true;

I guess it would have resulted in more consistent code if you had
used parse_boolean() here, too.

> @@ -222,6 +246,8 @@ const struct microcode_ops *microcode_ops;
>  static DEFINE_SPINLOCK(microcode_mutex);
>  
>  DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
> +/* Store error code of the work done in NMI handler */
> +DEFINE_PER_CPU(int, loading_err);

static

> @@ -356,42 +383,88 @@ static void set_state(unsigned int state)
>      smp_wmb();
>  }
>  
> -static int secondary_thread_fn(void)
> +static int secondary_nmi_work(void)
>  {
> -    unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
> +    cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>  
> -    if ( !wait_for_state(LOADING_CALLIN) )
> -        return -EBUSY;
> +    return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
> +}
> +
> +static int primary_thread_work(const struct microcode_patch *patch)
> +{
> +    int ret;
>  
>      cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>  
> -    if ( !wait_for_state(LOADING_EXIT) )
> +    if ( !wait_for_state(LOADING_ENTER) )
>          return -EBUSY;
>  
> -    /* Copy update revision from the primary thread. */
> -    this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
> +    ret = microcode_ops->apply_microcode(patch);
> +    if ( !ret )
> +        atomic_inc(&cpu_updated);
> +    atomic_inc(&cpu_out);
>  
> -    return 0;
> +    return ret;
>  }
>  
> -static int primary_thread_fn(const struct microcode_patch *patch)
> +static int primary_nmi_work(const struct microcode_patch *patch)
> +{
> +    return primary_thread_work(patch);
> +}

Why this wrapper? The function signatures are identical. I guess
you want to emphasize the environment the function is to be used
in, so perhaps fine despite the redundancy. At least there's no
address taken of this function, so the compiler can eliminate it.

> +static int secondary_thread_fn(void)
> +{
>      if ( !wait_for_state(LOADING_CALLIN) )
>          return -EBUSY;
>  
> -    cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
> +    self_nmi();
>  
> -    if ( !wait_for_state(LOADING_ENTER) )
> +    /* Copy update revision from the primary thread. */
> +    this_cpu(cpu_sig).rev =
> +        per_cpu(cpu_sig, cpumask_first(this_cpu(cpu_sibling_mask))).rev;

_alternative_instructions() takes specific care to avoid relying on
the NMI potentially not arriving synchronously (in which case you'd
potentially copy a not-yet-updated CPU signature above). I think the
same care wants applying here, which I guess would be another

    wait_for_state(LOADING_EXIT);

> +    return this_cpu(loading_err);
> +}
> +
> +static int primary_thread_fn(const struct microcode_patch *patch)
> +{
> +    if ( !wait_for_state(LOADING_CALLIN) )
>          return -EBUSY;
>  
> -    ret = microcode_ops->apply_microcode(patch);
> -    if ( !ret )
> -        atomic_inc(&cpu_updated);
> -    atomic_inc(&cpu_out);
> +    if ( ucode_in_nmi )
> +    {
> +        self_nmi();
> +        return this_cpu(loading_err);

Same here than, to protect against returning a not-yet-updated error
indicator.

> @@ -420,14 +498,23 @@ static int control_thread_fn(const struct 
> microcode_patch *patch)
>          return ret;
>      }
>  
> -    /* Let primary threads load the given ucode update */
> -    set_state(LOADING_ENTER);
> -
> +    /* Control thread loads ucode first while others are in NMI handler. */
>      ret = microcode_ops->apply_microcode(patch);
>      if ( !ret )
>          atomic_inc(&cpu_updated);
>      atomic_inc(&cpu_out);
>  
> +    if ( ret == -EIO )
> +    {
> +        printk(XENLOG_ERR
> +               "Late loading aborted: CPU%u failed to update ucode\n", cpu);
> +        set_state(LOADING_EXIT);
> +        return ret;
> +    }
> +
> +    /* Let primary threads load the given ucode update */
> +    set_state(LOADING_ENTER);

While the description goes to some lengths to explain this ordering of
updates, I still don't really see the point: How is it better for the
control CPU to have updated its ucode early and then hit an NMI before
the other CPUs have even started updating, than the other way around
in the opposite case?

> @@ -456,6 +543,8 @@ static int control_thread_fn(const struct microcode_patch 
> *patch)
>      /* Mark loading is done to unblock other threads */
>      set_state(LOADING_EXIT);
>  
> +    set_nmi_callback(saved_nmi_callback);
> +    nmi_patch = ZERO_BLOCK_PTR;

Another smp_wmb() between them, just to be on the safe side?

> @@ -515,6 +604,13 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
> buf, unsigned long len)
>          return -EBUSY;
>      }
>  
> +    /*
> +     * CPUs except the first online CPU would send a fake (self) NMI to
> +     * rendezvous in NMI handler. But a fake NMI to nmi_cpu may trigger
> +     * unknown_nmi_error(). It ensures nmi_cpu won't receive a fake NMI.
> +     */
> +    ASSERT(cpumask_first(&cpu_online_map) == nmi_cpu);

Looking at this again, I don't think it should be ASSERT(). Instead
you want to return an (easy to recognize) error in this case - maybe
-EPERM or -ENOEXEC or -EXDEV. This is not the least to also be safe
in non-debug builds.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.