[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch v3 2/2] x86/microcode: Synchronize late microcode loading



>>> On 09.05.18 at 00:01, <chao.gao@xxxxxxxxx> wrote:
> @@ -30,15 +31,21 @@
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/stop_machine.h>
>  #include <xen/tasklet.h>
>  #include <xen/guest_access.h>
>  #include <xen/earlycpio.h>
> +#include <xen/watchdog.h>
>  
> +#include <asm/delay.h>
>  #include <asm/msr.h>
>  #include <asm/processor.h>
>  #include <asm/setup.h>
>  #include <asm/microcode.h>
>  
> +/* By default, wait for 30000us */
> +#define MICROCODE_DEFAULT_TIMEOUT 30000

Please attach the unit (_US) to the name.

> @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t 
> size)
>      return err;
>  }
>  
> -static long do_microcode_update(void *_info)
> +/* Wait for all CPUs to rendezvous with a timeout (us) */
> +static int wait_for_cpus(atomic_t *cnt, unsigned int timeout)
>  {
> -    struct microcode_info *info = _info;
> -    int error;
> +    unsigned int cpus = num_online_cpus();
>  
> -    BUG_ON(info->cpu != smp_processor_id());
> +    atomic_inc(cnt);
>  
> -    error = microcode_update_cpu(info->buffer, info->buffer_size);
> -    if ( error )
> -        info->error = error;
> +    while ( atomic_read(cnt) != cpus )
> +    {
> +        if ( timeout <= 0 )

!timeout (or timeout == 0), now that it's of unsigned type.

> +static int do_microcode_update(void *_info)
> +{
> +    struct microcode_info *info = _info;
> +    unsigned int cpu = smp_processor_id();
> +    int ret;
> +
> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
> +    if ( ret )
> +        return ret;
> +
> +    /*
> +     * Logical threads which set the first bit in cpu_sibling_mask can do
> +     * the update. Other sibling threads just await the completion of
> +     * microcode update.
> +     */
> +    if ( !cpumask_test_and_set_cpu(
> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
> +    /*
> +     * Increase the wait timeout to a safe value here since we're serializing
> +     * the microcode update and that could take a while on a large number of
> +     * CPUs. And that is fine as the *actual* timeout will be determined by
> +     * the last CPU finished updating and thus cut short
> +     */
> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
> +                                       nr_phys_cpus) )

I remain unconvinced that this is a safe thing to do on a huge system with
guests running (even Dom0 alone would seem risky enough). I continue to
hope for comments from others, in particular Andrew, here. At the very
least I think you should taint the hypervisor when making it here.

> +    /*
> +     * Late loading dance. Why the heavy-handed stop_machine effort?
> +     *
> +     * -HT siblings must be idle and not execute other code while the other
> +     *  sibling is loading microcode in order to avoid any negative
> +     *  interactions cause by the loading.
> +     *
> +     * -In addition, microcode update on the cores must be serialized until
> +     *  this requirement can be relaxed in the feature. Right now, this is

s/feature/future/

Also please add blanks after the hyphens.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.