[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch v3 2/2] x86/microcode: Synchronize late microcode loading

On Wed, May 09, 2018 at 06:01:33AM +0800, Gao, Chao wrote:
>This patch ports microcode improvement patches from linux kernel.
>Before you read any further: the early loading method is still the
>preferred one and you should always do that. The following patch is
>improving the late loading mechanism for long running jobs and cloud use
>Gather all cores and serialize the microcode update on them by doing it
>one-by-one to make the late update process as reliable as possible and
>avoid potential issues caused by the microcode update.
>This patch is also in accord with Andrew's suggestion,
>"Rendezvous all online cpus in an IPI to apply the patch, and keep the
>processors in until all have completed the patch.", in [1].
>Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
>Tested-by: Chao Gao <chao.gao@xxxxxxxxx>
>[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
>[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
>Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
>Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>
>Cc: Ashok Raj <ashok.raj@xxxxxxxxx>
>Cc: Borislav Petkov <bp@xxxxxxx>
>Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>Cc: Jan Beulich <jbeulich@xxxxxxxx>
>+static int do_microcode_update(void *_info)
>+    struct microcode_info *info = _info;
>+    unsigned int cpu = smp_processor_id();
>+    int ret;
>+    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>+    if ( ret )
>+        return ret;
>+    /*
>+     * Logical threads which set the first bit in cpu_sibling_mask can do
>+     * the update. Other sibling threads just await the completion of
>+     * microcode update.
>+     */
>+    if ( !cpumask_test_and_set_cpu(
>+                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>+        ret = microcode_update_cpu(info->buffer, info->buffer_size);


A critical issue I realized is that microcode_update_cpu() here contains
much things, for instance, parsing the microcode binary and loading it. It is
a bad idea to put so much things in stop_machine context, especially memory
allocation (I did observe one assertion triggered sometimes when doing this).
AFAIK, we have two solutions:
1. use a malloc variation which can be used in stop_machine context.
2. like linux kernel, we can separate the microcode parsing from loading,
and in stop_machine context only do microcode loading. When parsing microcode
binary, all valid ucodes are put into a list. At loading stage, going through
the list is definitely safe.

What's your opinion on these two solutions?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.