[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading



On Fri, Apr 13, 2018 at 09:49:17AM -0600, Jan Beulich wrote:
>>>> On 30.03.18 at 08:59, <chao.gao@xxxxxxxxx> wrote:
>> @@ -281,24 +287,52 @@ static int microcode_update_cpu(const void *buf, 
>> size_t size)
>>      return err;
>>  }
>>  
>> -static long do_microcode_update(void *_info)
>> +static int __wait_for_cpus(atomic_t *cnt, int timeout)
>
>No new double-underscore prefixed functions please.
>
>>  {
>> -    struct microcode_info *info = _info;
>> -    int error;
>> +    int cpus = num_online_cpus();
>
>unsigned int
>
>> -    BUG_ON(info->cpu != smp_processor_id());
>> +    atomic_inc(cnt);
>>  
>> -    error = microcode_update_cpu(info->buffer, info->buffer_size);
>> -    if ( error )
>> -        info->error = error;
>> +    while (atomic_read(cnt) != cpus)
>
>There are a number of style issues in the patch, mostly (like here) missing
>blanks.

Will fix all issues pointed out by comments above.

>
>> +    {
>> +        if ( timeout <= 0 )
>> +        {
>> +            printk("Timeout when waiting for CPUs calling in\n");
>> +            return -1;
>> +        }
>> +        udelay(1);
>> +        timeout--;
>> +    }
>>  
>> -    info->cpu = cpumask_next(info->cpu, &cpu_online_map);
>> -    if ( info->cpu < nr_cpu_ids )
>> -        return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
>> info);
>> +    return 0;
>> +}
>>  
>> -    error = info->error;
>> -    xfree(info);
>> -    return error;
>> +static int do_microcode_update(void *_info)
>> +{
>> +    struct microcode_info *info = _info;
>> +    int error, ret = 0;
>> +
>> +    error = __wait_for_cpus(&info->cpu_in, USEC_PER_SEC);
>
>Why this long a timeout here?

I just use the same timeout as the patch on linux kernel side. As we
know if the timeout is too small, updating microcode may be likely to
failed even if other CPUs disabled interrupt temporally.

If you object to such a long timeout (for Xen may need much smaller
time to rendezvous all CPUs compared to linux kernel because Xen doesn't
have device drivers which may malfunction), how about just use the
default timeout, 30ms, used by live patching? if it is also not
good enough, then we make it an option which comes from callers.

>
>> +    if ( error )
>> +    {
>> +        ret = -EBUSY;
>> +        return ret;
>> +    }
>> +
>> +    error = microcode_update_cpu(info->buffer, info->buffer_size);
>> +    if ( error && !ret )
>> +        ret = error;
>> +    /*
>> +     * Increase the wait timeout to a safe value here since we're 
>> serializing
>> +     * the microcode update and that could take a while on a large number of
>> +     * CPUs. And that is fine as the *actual* timeout will be determined by
>> +     * the last CPU finished updating and thus cut short
>> +     */
>> +    error = __wait_for_cpus(&info->cpu_out, USEC_PER_SEC * 
>> num_online_cpus());
>
>And this one's even worse, in particular on huge systems. I'm afraid such a 
>long
>period of time in stop-machine context is going to confuse most of the running
>domains (including Dom0). There's nothing inherently wrong with e.g. processing
>the updates on distinct cores (and even more so on distinct sockets) in 
>parallel.

I cannot say for sure. But the original patch does want updating
microcode be performed one-by-one.

>Therefore revising the locking in microcode_update_cpu() might be a necessary
>prereq step.

Do you mean changing it to a per-core or per-socket lock?


>Or alternatively you may need to demand that no other running
>domains exist besides Dom0 (and hope the best for Dom0 itself).
>
>I also don't think there's any point invoking the operation on all HT threads 
>on a
>core, but I realize stop_machine_run() isn't flexible enough to allow such.

Only one thread in a core will do the actual update. Other threads only
check the microcode version and find the version is already
update-to-date, then exit the critical region. Considering the check may
don't need much time, I wonder whether it can significantly benefit the
overall time to invoking the operation on all HT threads on a core?  

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.