[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xens handling of MCE



On Thu, Sep 07, 2023 at 08:56:51AM +0200, Jan Beulich wrote:
> On 06.09.2023 23:38, Elliott Mitchell wrote:
> > On Thu, Aug 31, 2023 at 07:52:05PM +0000, Development wrote:
> >>
> >>     However, in 2009-02, "cegger" wrote MCA/MCE_in_Xen, a proposal for 
> >> having xen start checking the information
> >>     Xen started accessing the EDAC information (now called "MCE") at some 
> >> point after that, which blocks the linux kernel in dom0 from accessing it.
> >>     (I also found what appears to be related sides from a presentation 
> >> from 2012 at: 
> >> https://lkml.iu.edu/hypermail/linux/kernel/1206.3/01304/xen_vMCE_design_%28v0_2%29.pdf
> >>  )
> >>
> > 
> > I hadn't seen that before.  Clearly shows someone who had no idea what
> > they were doing designed it.  The author was thinking "virtualize 
> > everything!", whereas MCE is a perfect situation for paravirtualization.
> > Let Dom0 process MCE events (which allows use of Linux's more up to date
> > MCE drivers), then let Dom0 notify Xen if action is needed (a page was
> > corrupted, tell the effected domain).
> > 
> > There was a recent proposal to simply import Linux's rather more recent
> > MCE/EDAC source.  This hasn't happened yet.  For people using Xen this
> > has been a very concerning issue for some time.
> 
> I'm unaware of such a proposal; do you have a reference? EDAC drivers
> typically are vendor- or even chipset-specific aiui. At least the latter
> wouldn't make them a good fit to import into Xen. Along of what you say
> earlier, they instead want to become Xen-aware (to deal with address
> translation as necessary). That'll also have better chances of things
> staying up-to-date.

I don't recall who wrote the message, I think it was less than 6 months
ago though.  I read it as $person had been pondering the idea of simply
ripping out Xen's MCE implementation and replacing it with minimally
adjusted Linux MCE implementation.

What you describe matches my thinking.  Even though the EDAC hardware is
fully attached to processors now, it doesn't need virtualization similar
to page tables.  Instead EDAC should be handled similar to most hardware
devices and go through Domain 0.

The approach for Xen should also differ.  Instead of first telling the
OS, it might be better to immediately unmap the page and trigger a page
fault if it is accessed.  Then notify the OS a page has disappeared.
Mainly immitate how Linux handles MCE events for a userspace process,
rather than the usual paravirtualization.

I'm not on sufficiently intimate terms with the drivers or hardware to
try this right now.  Yet the number of complaints about this is rather
substantial (okay, I'm aware since this is no small concern for me too).


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.