Xen project Mailing List

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 13 Apr 2018 09:49:17 -0600

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Ashok Raj <ashok.raj@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Fri, 13 Apr 2018 15:49:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 30.03.18 at 08:59, <chao.gao@xxxxxxxxx> wrote: > @@ -281,24 +287,52 @@ static int microcode_update_cpu(const void *buf, size_t > size) > return err; > } > > -static long do_microcode_update(void *_info) > +static int __wait_for_cpus(atomic_t *cnt, int timeout) No new double-underscore prefixed functions please. > { > - struct microcode_info *info = _info; > - int error; > + int cpus = num_online_cpus(); unsigned int > - BUG_ON(info->cpu != smp_processor_id()); > + atomic_inc(cnt); > > - error = microcode_update_cpu(info->buffer, info->buffer_size); > - if ( error ) > - info->error = error; > + while (atomic_read(cnt) != cpus) There are a number of style issues in the patch, mostly (like here) missing blanks. > + { > + if ( timeout <= 0 ) > + { > + printk("Timeout when waiting for CPUs calling in\n"); > + return -1; > + } > + udelay(1); > + timeout--; > + } > > - info->cpu = cpumask_next(info->cpu, &cpu_online_map); > - if ( info->cpu < nr_cpu_ids ) > - return continue_hypercall_on_cpu(info->cpu, do_microcode_update, > info); > + return 0; > +} > > - error = info->error; > - xfree(info); > - return error; > +static int do_microcode_update(void *_info) > +{ > + struct microcode_info *info = _info; > + int error, ret = 0; > + > + error = __wait_for_cpus(&info->cpu_in, USEC_PER_SEC); Why this long a timeout here? > + if ( error ) > + { > + ret = -EBUSY; > + return ret; > + } > + > + error = microcode_update_cpu(info->buffer, info->buffer_size); > + if ( error && !ret ) > + ret = error; > + /* > + * Increase the wait timeout to a safe value here since we're serializing > + * the microcode update and that could take a while on a large number of > + * CPUs. And that is fine as the *actual* timeout will be determined by > + * the last CPU finished updating and thus cut short > + */ > + error = __wait_for_cpus(&info->cpu_out, USEC_PER_SEC * > num_online_cpus()); And this one's even worse, in particular on huge systems. I'm afraid such a long period of time in stop-machine context is going to confuse most of the running domains (including Dom0). There's nothing inherently wrong with e.g. processing the updates on distinct cores (and even more so on distinct sockets) in parallel. Therefore revising the locking in microcode_update_cpu() might be a necessary prereq step. Or alternatively you may need to demand that no other running domains exist besides Dom0 (and hope the best for Dom0 itself). I also don't think there's any point invoking the operation on all HT threads on a core, but I realize stop_machine_run() isn't flexible enough to allow such. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.