[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] x86/microcode: Synchronize late microcode loading

>>> On 16.04.18 at 08:20, <chao.gao@xxxxxxxxx> wrote:
> On Fri, Apr 13, 2018 at 09:49:17AM -0600, Jan Beulich wrote:
>>>>> On 30.03.18 at 08:59, <chao.gao@xxxxxxxxx> wrote:
>>> +static int do_microcode_update(void *_info)
>>> +{
>>> +    struct microcode_info *info = _info;
>>> +    int error, ret = 0;
>>> +
>>> +    error = __wait_for_cpus(&info->cpu_in, USEC_PER_SEC);
>>Why this long a timeout here?
> I just use the same timeout as the patch on linux kernel side. As we
> know if the timeout is too small, updating microcode may be likely to
> failed even if other CPUs disabled interrupt temporally.
> If you object to such a long timeout (for Xen may need much smaller
> time to rendezvous all CPUs compared to linux kernel because Xen doesn't
> have device drivers which may malfunction), how about just use the
> default timeout, 30ms, used by live patching? if it is also not
> good enough, then we make it an option which comes from callers.

Yes, 30ms is likely to be acceptable.

>>> +    if ( error )
>>> +    {
>>> +        ret = -EBUSY;
>>> +        return ret;
>>> +    }
>>> +
>>> +    error = microcode_update_cpu(info->buffer, info->buffer_size);
>>> +    if ( error && !ret )
>>> +        ret = error;
>>> +    /*
>>> +     * Increase the wait timeout to a safe value here since we're 
>>> serializing
>>> +     * the microcode update and that could take a while on a large number 
>>> of
>>> +     * CPUs. And that is fine as the *actual* timeout will be determined by
>>> +     * the last CPU finished updating and thus cut short
>>> +     */
>>> +    error = __wait_for_cpus(&info->cpu_out, USEC_PER_SEC * 
>>> num_online_cpus());
>>And this one's even worse, in particular on huge systems. I'm afraid such a 
>>period of time in stop-machine context is going to confuse most of the running
>>domains (including Dom0). There's nothing inherently wrong with e.g. 
>>the updates on distinct cores (and even more so on distinct sockets) in 
> I cannot say for sure. But the original patch does want updating
> microcode be performed one-by-one.

And they don't restrict the "when" aspect in any way? I.e. all sorts of
applications (and perhaps even KVM guests) may be running?

>>Therefore revising the locking in microcode_update_cpu() might be a necessary
>>prereq step.
> Do you mean changing it to a per-core or per-socket lock?

Either changing to such, or introducing a second, more fine grained lock.

>>Or alternatively you may need to demand that no other running
>>domains exist besides Dom0 (and hope the best for Dom0 itself).
>>I also don't think there's any point invoking the operation on all HT threads 
>>on a
>>core, but I realize stop_machine_run() isn't flexible enough to allow such.
> Only one thread in a core will do the actual update. Other threads only
> check the microcode version and find the version is already
> update-to-date, then exit the critical region. Considering the check may
> don't need much time, I wonder whether it can significantly benefit the
> overall time to invoking the operation on all HT threads on a core?  

With better parallelism this would be less of an issue. Plus, as said - the
infrastructure we have at present wouldn't allow anyway what I've
described above, and hand-rolling another "flavor" of the stop-machine
logic is quite certainly out of question. To answer your question though:
Taking as the example a 4-threads-per-core system with sufficiently
many cores, I'm afraid the overall time spent in handling the
"uninteresting" threads may become measurable. But of course, if actual
numbers said otherwise, I'd be fine.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.