[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 7/8] x86/microcode: Synchronize late microcode loading
>>> On 28.01.19 at 08:06, <chao.gao@xxxxxxxxx> wrote: > @@ -30,18 +31,25 @@ > #include <xen/smp.h> > #include <xen/softirq.h> > #include <xen/spinlock.h> > +#include <xen/stop_machine.h> > #include <xen/tasklet.h> > #include <xen/guest_access.h> > #include <xen/earlycpio.h> > +#include <xen/watchdog.h> > > +#include <asm/delay.h> > #include <asm/msr.h> > #include <asm/processor.h> > #include <asm/setup.h> > #include <asm/microcode.h> > > +/* By default, wait for 30000us */ > +#define MICROCODE_DEFAULT_TIMEOUT_US 30000 This value deserves some explanation as to how it was chosen. > static module_t __initdata ucode_mod; > static signed int __initdata ucode_mod_idx; > static bool_t __initdata ucode_mod_forced; > +static unsigned int nr_cores; Could you see about avoiding such a static? You have ... > +static int do_microcode_update(void *unused) ... an "unused" here after all, and it's the (indirect) caller of the function which calculates the value. > +{ > + unsigned int cpu = smp_processor_id(); > + int ret; > > - error = microcode_update_cpu(); > - if ( error ) > - info->error = error; > + ret = wait_for_cpus(&cpu_in, MICROCODE_DEFAULT_TIMEOUT_US); > + if ( ret ) > + return ret; > > - info->cpu = cpumask_next(info->cpu, &cpu_online_map); > - if ( info->cpu < nr_cpu_ids ) > - return continue_hypercall_on_cpu(info->cpu, do_microcode_update, > info); > + /* > + * Initiate an update on all processors which don't have an online > sibling > + * thread with a lower thread id. Other sibling threads just await the > + * completion of microcode update. > + */ > + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) > + ret = microcode_update_cpu(); > + /* > + * Increase the wait timeout to a safe value here since we're serializing > + * the microcode update and that could take a while on a large number of > + * CPUs. And that is fine as the *actual* timeout will be determined by > + * the last CPU finished updating and thus cut short > + */ > + if ( wait_for_cpus(&cpu_out, MICROCODE_DEFAULT_TIMEOUT_US * nr_cores) ) > + panic("Timeout when finishing updating microcode"); While I expect this to go away again in the next patch, I'd still like to see this improved, in particular in case the patch here goes in independently of the next one. After all on a system with 100 cores the timeout totals to a whopping 3 seconds. Generally the time needed to wait scales by the number of CPUs still in need of doing the update. And if a timeout is really to occur, it's perhaps because of one bad core or socket, not because nothing works at all. Hence it would seem both nice and possible to scale the "remaining time to wait" by the (known) number of remaining processors to respond. Additionally it would be confusing to see dozens of panics in case this actually triggers, because all CPUs would hit this at about the same time. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |