Xen project Mailing List

Re: [Xen-devel] [PATCH v2] x86/microcode: Synchronize late microcode loading

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 30 Apr 2018 09:25:26 -0600

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Ashok Raj <ashok.raj@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Jun Nakajima <jun.nakajima@xxxxxxxxx>, tglx@xxxxxxxxxxxxx, Borislav Petkov <bp@xxxxxxx>

Delivery-date: Mon, 30 Apr 2018 15:25:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 25.04.18 at 13:46, <chao.gao@xxxxxxxxx> wrote: > @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t > size) > return err; > } > > -static long do_microcode_update(void *_info) > +/* Wait for all CPUs to rendezvous with a timeout (us) */ > +static int wait_for_cpus(atomic_t *cnt, int timeout) unsigned int > +static int do_microcode_update(void *_info) > +{ > + struct microcode_info *info = _info; > + unsigned int cpu = smp_processor_id(); > + int ret; > + > + ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT); > + if ( ret ) > + return ret; > + /* > + * Logical threads which set the first bit in cpu_sibling_mask can do > + * the update. Other sibling threads just await the completion of > + * microcode update. > + */ > + if ( cpumask_test_and_set_cpu( > + cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) ) > + ret = microcode_update_cpu(info->buffer, info->buffer_size); Isn't the condition inverted (i.e. missing a ! )? Also I take it that you've confirmed that loading ucode in parallel on multiple cores of the same socket is not a problem? The comment in the last hunk suggests otherwise. > + /* > + * Increase the wait timeout to a safe value here since we're serializing > + * the microcode update and that could take a while on a large number of > + * CPUs. And that is fine as the *actual* timeout will be determined by > + * the last CPU finished updating and thus cut short > + */ > + if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT * > + num_online_cpus()) ) > + panic("Timeout when finishing updating microcode"); A 3s timeout (as an example for a system with 100 CPU threads) is still absurdly high to me, but considering you panic() anyway if you hit the timeout the question mainly is whether there's a slim chance for this to complete a brief moment before the timeout expires. If all goes well, you won't come close to even 1s, but as said before - there may be guests running, and they may become utterly confused if they don't get any time within a second or more. With you no longer doing things sequentially I don't, however, see why you need to scale the timeout by CPU count. > + > + return ret; > } You're losing this return value (once for every CPU making it into this function). > @@ -318,26 +357,52 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) > buf, unsigned long len) > > ret = copy_from_guest(info->buffer, buf, len); > if ( ret != 0 ) > - { > - xfree(info); > - return ret; > - } > + goto free; > > info->buffer_size = len; > - info->error = 0; > - info->cpu = cpumask_first(&cpu_online_map); > + > + /* cpu_online_map must not change during update */ > + if ( !get_cpu_maps() ) > + { > + ret = -EBUSY; > + goto free; > + } > > if ( microcode_ops->start_update ) > { > ret = microcode_ops->start_update(); > if ( ret != 0 ) > - { > - xfree(info); > - return ret; > - } > + goto put; > } > > - return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info); > + cpumask_empty(&info->cpus); DYM cpumask_clear()? > + atomic_set(&info->cpu_in, 0); > + atomic_set(&info->cpu_out, 0); > + > + /* > + * We intend to disable interrupt for long time, which may lead to > + * watchdog timeout. > + */ > + watchdog_disable(); > + /* > + * Late loading dance. Why the heavy-handed stop_machine effort? > + * > + * -HT siblings must be idle and not execute other code while the other > + * sibling is loading microcode in order to avoid any negative > + * interactions cause by the loading. > + * > + * -In addition, microcode update on the cores must be serialized until > + * this requirement can be relaxed in the feature. Right now, this is > + * conservative and good. This is the comment I've referred to above. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.