[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] misc/xenmicrocode: Upload /lib/firmware/<some blob> to the hypervisor



On Thu, Jan 29, 2015, at 10:17, Borislav Petkov wrote:
> On Thu, Jan 29, 2015 at 09:36:49AM -0200, Henrique de Moraes Holschuh
> wrote:
> > But the fact that you cannot trust a system with mismatched microcode to
> > be stable is the hard truth: neither AMD nor Intel are really enforcing
> > that late microcode updates will be always safe in all conditions.
> 
> How do you know? Have you talked to anyone?

No, I have not.  It was a direct observation based on the fact that
there is a report of a system misbehaving because of mismatched AMD
microcode in this very thread.  Now, if someone from AMD did state that
they are enforcing that the processor should work with mismatched
microcode at all times, so that it won't happen again, I stand
corrected.

That said, I don't see how anyone could support stable operation in all
cases on a system with mismatched microcode (time window where not all
cores have been updated yet) without adding quite important constraints,
either to what you can do in a microcode update, or on what the system
can do while it has not yet finished updating all cores (probably both).

I recall some AMD microcode updates did add functionality, so you could
have issues if a task migrates from a more capable microcode (up-to-date
core) to a less capable microcode (non-up-to-date core), for example.

The same reasoning is valid for Intel.

> Can you imagine that someone might have done that and asked exactly that
> question and got an assurance that CPU vendors actually do try to make
> microcode updates self-contained?

If that happened, I'd like to see the list of assumptions made by the
hardware vendor side, though.  When is that assurance valid?  Are we
meeting all their expectations?

> So let me stop you right there with that "hard truth" drama. Just stick
> to the facts. If you don't have any facts, don't create them out of thin
> air. Ok?

See above.  I am not sure we'd be even asking for something that is
possible/desireable [as currently implemented in Linux].  Currently we
run the late microcode update with the system as-is, in full production
mode...

Maybe if we could ensure (in Linux) that there will be no task migration
between CPUs (including due to clone/fork/whatever) and no global system
state change (i.e. visible to tasks on other processors) that could
depend on microcode-provided capabilities until the late microcode
update system finished the microcode update run for all processors...

> > For example, on Intel you must *never* have two CPUs attempt to update
> > the same "microcode store" at the same time, which requires that you
> 
> Interesting - this is the first time I hear about such restriction.
> Details?

The restriction is in the Intel SDM, section 9.11.6.2, (vol. 3A, page
9-35):

"9.11.6.3   Update in a System Supporting Intel Hyper-Threading
Technology:
Intel Hyper-Threading Technology has implications on the loading of the
microcode update. The update must be
loaded for each core in a physical processor. Thus, for a processor
supporting Intel Hyper-Threading Technology,
only one logical processor per core is required to load the microcode
update. Each individual logical processor can
independently load the update. However, MP initialization must provide
some mechanism (e.g. a software sema-
phore) to force serialization of microcode update loads and to prevent
simultaneous load attempts to the same
core."

I recall (from my code archeology efforts a few months ago on the intel
microcode driver) that the serialization across CPUs during a microcode
update was added to the ancient Intel microcode driver more than a
decade ago because of that requirement.

I don't know what would happen should you violate that restriction,
though.  It could be something that modern Intel 64 processors simply
don't care about, just like that 16-byte alignment restriction.

> And then there's the other group who can't afford to reboot long running
> machines for whatever reason. As long as we can support both, we should
> support both.

You know my position on this: I consider it a valid advanced use-case,
for those who really know what they're doing. But it shouldn't be the
default mode of operation for Linux distros, and one shouldn't assume it
is going to be safe without extra data (thus the "for those who really
know what they're doing").

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique de Moraes Holschuh <hmh@xxxxxxxxxx>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.