[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch v3 2/2] x86/microcode: Synchronize late microcode loading



On Wed, May 16, 2018 at 07:46:48AM -0600, Jan Beulich wrote:
>>>> On 16.05.18 at 15:25, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 16/05/18 14:10, Jan Beulich wrote:
>>>> +static int do_microcode_update(void *_info)
>>>> +{
>>>> +    struct microcode_info *info = _info;
>>>> +    unsigned int cpu = smp_processor_id();
>>>> +    int ret;
>>>> +
>>>> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>>>> +    if ( ret )
>>>> +        return ret;
>>>> +
>>>> +    /*
>>>> +     * Logical threads which set the first bit in cpu_sibling_mask can do
>>>> +     * the update. Other sibling threads just await the completion of
>>>> +     * microcode update.
>>>> +     */
>>>> +    if ( !cpumask_test_and_set_cpu(
>>>> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), 
>>>> &info->cpus) )
>>>> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
>>>> +    /*
>>>> +     * Increase the wait timeout to a safe value here since we're 
>>>> serializing
>>>> +     * the microcode update and that could take a while on a large number 
>>>> of
>>>> +     * CPUs. And that is fine as the *actual* timeout will be determined 
>>>> by
>>>> +     * the last CPU finished updating and thus cut short
>>>> +     */
>>>> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
>>>> +                                       nr_phys_cpus) )
>>> I remain unconvinced that this is a safe thing to do on a huge system with
>>> guests running (even Dom0 alone would seem risky enough). I continue to

I think there are other operations may also endanger the security, stability
of the whole system. We offer them with caveats. Same here, three
different methods can be used to update microcode; the late update isn't
perfect at this moment. At least, we provide a more reliable method to update
microcode at runtime on systems with no so many cores. And for a huge
system, admins can assess the risk and choose the most suitable method.
They can completely avoid doing live updates and mandate a reboot and do
it early since that's the most dependable method.

>>> hope for comments from others, in particular Andrew, here. At the very
>>> least I think you should taint the hypervisor when making it here.
>> 
>> I see nothing in this patch which prevents a deadlock against the time
>> calibration rendezvous.  It think its fine to pause the time calibration
>> rendezvous while performing this update.
>
>If there's a problem here, wouldn't that be a general one with
>stop_machine()?

I agree with Jan. It shouldn't be specific to the stop_machine() here.
Anyhow, I will look into the potential deadlock you mentioned.

>
>> Also, what is the purpose of serialising the updates while all pcpus are
>> in rendezvous?

microcode_mutex which prevents doing the updates in parallel is not
introduced by this patch. At present, We want to keep this patch and the
update process simple. Could we just make it work first and try to work
out some optimizations later?

>> Surely at that point the best option is to initiate an
>> update on all processors which don't have an online sibling thread with
>> a lower thread id.
>
>I've suggested that before.

I think Andrew's suggestion here is similar to the method which this patch is
using.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.