[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode()



>>> On 11.06.19 at 10:53, <chao.gao@xxxxxxxxx> wrote:
> On Tue, Jun 11, 2019 at 01:08:36AM -0600, Jan Beulich wrote:
>>>>> On 11.06.19 at 05:32, <chao.gao@xxxxxxxxx> wrote:
>>> On Wed, Jun 05, 2019 at 06:37:27AM -0600, Jan Beulich wrote:
>>>>>>> On 27.05.19 at 10:31, <chao.gao@xxxxxxxxx> wrote:
>>>>> During late microcode update, apply_microcode() is invoked in
>>>>> cpu_request_microcode(). To make late microcode update more reliable,
>>>>> we want to put the apply_microcode() into stop_machine context. So
>>>>> we split out it from cpu_request_microcode(). As a consequence,
>>>>> apply_microcode() should be invoked explicitly in the common code.
>>>>> 
>>>>> Previously, apply_microcode() gets the microcode patch to be applied from
>>>>> the microcode cache. Now, the patch is passed as a function argument and
>>>>> a patch is cached for cpu-hotplug and cpu resuming, only after it has
>>>>> been loaded to a cpu without any error. As a consequence, the
>>>>> 'match_cpu' check in microcode_update_cache is removed, which otherwise
>>>>> would fail.
>>>>
>>>>The "only after it has been loaded to a cpu without any error" is a
>>>>problem, precisely for the case where ucode on the different cores
>>>>is not in sync initially. I would actually like to put up this question:
>>>>When a core has no ucode loaded at all yet and only strictly older
>>>>(than loaded on some other cores) ucode is found to be available,
>>>>whether then it wouldn't still be better to apply that ucode to
>>>>_at least_ the cores that have none loaded yet.
>>> 
>>> Yes, it is better for this special case. And I agree to support this case.
>>> 
>>> This in v7, a patch is loaded only if its revision is newer than that
>>> loaded to current CPU. And it is stored only if it has been loaded
>>> successfully. But, as you described, a broken bios might puts the system
>>> in an inconsistent state (multiple microcode revision in the system) and
>>> furthermore in this case, if no or an older microcode update is
>>> provided, early loading cannot get the system into sane state. So for
>>> both early and late microcode loading, we could face a situation that
>>> the patch to be loaded has equal or old revision than microcode of some
>>> CPUs.
>>> 
>>> Changes I plan to make in next version are:
>>> 1. For early microcode, a patch would be stored if it covers current CPU's
>>> signature. All CPUs would try to load from the cache.
>>> 2. For late microcode, a patch is loaded only if its revision is newer than
>>> *the patch cached*. And it is stored only if has been loaded without an
>>> "EIO" error.
>>> 3. Cache replacement remains the same.
>>
>>Why the difference between early and late loading?
> 
> Storing a patch without loading it is problematic. We need complex logics
> to restore the old patch if the current patch is proved to be broken.
> I really want to avoid going this way. So for late microcode, we still
> stick to the rule: storing a patch only after it has been loaded. For
> late loading, we can try to load a patch as long as the patch covers
> current cpu signature to avoid missing any possible update. But thanks
> to early loading, the oldest microcode revision on all online CPUs
> shouldn't be older than the cache. So as an optimization, we initiate an
> update system-wide only if the patch's revision is newer than the cache.

Well, you seem to be considering only one out of two possible cases.
What if no ucode was loaded early at all? There's no "thanks to early
loading" then.

> For early loading, to avoid discarding a potential useful patch, an
> exception is made to store the newest matching patch without loading it
> and all CPus try to load the patch. One problem is if a broken patch
> with very high revision is provided, any subsequent attempt of late
> loading would fail. It is unlikely to happen, so I plan to leave it
> aside. Otherwise, we can clean up the cache in microcode_init() if no
> cpu has loaded this patch (we need a global variable to track the status).

Yes, if a patch that claims to be newer than what any CPU has
already loaded fails to apply on every CPU, then yes, it surely
should be purged from the cache.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.