[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on current Xen EDAC/MCE situation



On Tue, Jan 23, 2024 at 11:44:03AM +0100, Jan Beulich wrote:
> On 22.01.2024 21:53, Elliott Mitchell wrote:
> 
> > I find the present handling of MCE in Xen an odd choice.  Having Xen do
> > most of the handling of MCE events is a behavior matching a traditional
> > stand-alone hypervisor.  Yet Xen was originally pushing any task not
> > requiring hypervisor action onto Domain 0.
> 
> Not exactly. Xen in particular deals with all of CPU and all of memory.
> Dom0 may be unaware of the full amount of CPUs in the system, nor the
> full memory map (without resorting to interfaces specifically making
> that information available, but not to be used for Dom0 kernel's own
> acting as a kernel).

Why would this be an issue?

I would expect the handling to be roughly:  NMI -> Xen; Xen schedules a
Dom0 vCPU which is eligible to run on the pCPU onto the pCPU; Dom0
examines registers/MSRs, Dom0 then issues a hypercall to Xen telling
Xen how to resolve the issue (no action, fix memory contents, kill page).

Ideally there would be an idle Dom0 vCPU, but interrupting a busy vCPU
would be viable.  It would even be reasonable to ignore affinity and
grab any Dom0 vCPU.

Dom0 has 2 purposes for the address.  First, to pass it back to Xen.
Second, to report it to a system administrator so they could restart the
system with that address marked as bad.  Dom0 wouldn't care whether the
address was directly accessible to it or not.

The proposed hypercall should report back what was effected by a UE
event.  A given site might have a policy that if $some_domain is hit by a
UE, everything is restarted.  Meanwhile Dom0 or Xen being the winner
could deserve urgent action.


> > MCE seems a perfect match for sharing responsibility with Domain 0.
> > Domain 0 needs to know about any MCE event, this is where system
> > administrators will expect to find logs.  In fact, if the event is a
> > Correctable Error, then *only* Domain 0 needs to know.  For a CE, Xen
> > may need no action at all (an implementation could need help) and
> > the effected domain would need no action.  It is strictly for
> > Uncorrectable Errors that action beside logging is needed.
> > 
> > For a UE memory error, the best approach might be for Domain 0 to decode
> > the error.  Once Domain 0 determines it is UE, invoke a hypercall to pass
> > the GPFN to Xen.
> 
> What GPFN? Decoding can only possibly find machine addresses in what
> hardware supplies.

I may have chosen the wrong term here.

> > The key advantage of this approach is it makes MCE handling act very
> > similar to MCE handling without Xen.
> 
> While that's true, you're completely omitting all implications towards
> what it means to hand off most handling to Dom0. While it is perhaps
> possible to make Linux'es chipset-specific EDAC drivers Xen PV aware,
> it might be yet harder to achieve the same in a PVH Dom0.

Much of it *doesn't* need to be Xen-aware.  There needs to be some
mechanism to allow Dom0 to access special MSRs, beyond that Xen would
only need to interpose between decoding and handling.

> >  Documentation about how MCEs are
> > reported/decoded would apply equally to Xen.  Another rather important
> > issue is it means less maintenance work to keep MCE handling working with
> > cutting-edge hardware.  I've noticed one vendor being sluggish about
> > getting patches into Linux and I fear similar issues may apply more
> > severely to Xen.
> 
> With all of your suggestions: Who do you think is going to do all of
> the work involved here (properly writing down a design, to take care
> of all known difficulties, and then actually implement everything)?
> We're already short on people, as you're very likely aware.

Right now I'm mostly want to know what general course of action is
planned/desired.

Several of the Linux x86 EDAC drivers have been adding a check for a
hypervisor and refusing to load if one is present.  The stated reason
being to get rid of a message.  Problem is this is being scattered into
several places and will make paravirtualized handling *much* harder.  As
such taking action to ensure this is in a single location is kind of
urgent now.

I'm kind of wonder if this is quietly being encouraged by a Redmond, WA
company to poison the well for other hypervisors...

(the OS wars are over, we're now into the hypervisor wars)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.