[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Xens handling of MCE
On Thu, Sep 07, 2023 at 08:56:51AM +0200, Jan Beulich wrote: > On 06.09.2023 23:38, Elliott Mitchell wrote: > > On Thu, Aug 31, 2023 at 07:52:05PM +0000, Development wrote: > >> > >> However, in 2009-02, "cegger" wrote MCA/MCE_in_Xen, a proposal for > >> having xen start checking the information > >> Xen started accessing the EDAC information (now called "MCE") at some > >> point after that, which blocks the linux kernel in dom0 from accessing it. > >> (I also found what appears to be related sides from a presentation > >> from 2012 at: > >> https://lkml.iu.edu/hypermail/linux/kernel/1206.3/01304/xen_vMCE_design_%28v0_2%29.pdf > >> ) > >> > > > > I hadn't seen that before. Clearly shows someone who had no idea what > > they were doing designed it. The author was thinking "virtualize > > everything!", whereas MCE is a perfect situation for paravirtualization. > > Let Dom0 process MCE events (which allows use of Linux's more up to date > > MCE drivers), then let Dom0 notify Xen if action is needed (a page was > > corrupted, tell the effected domain). > > > > There was a recent proposal to simply import Linux's rather more recent > > MCE/EDAC source. This hasn't happened yet. For people using Xen this > > has been a very concerning issue for some time. > > I'm unaware of such a proposal; do you have a reference? EDAC drivers > typically are vendor- or even chipset-specific aiui. At least the latter > wouldn't make them a good fit to import into Xen. Along of what you say > earlier, they instead want to become Xen-aware (to deal with address > translation as necessary). That'll also have better chances of things > staying up-to-date. I don't recall who wrote the message, I think it was less than 6 months ago though. I read it as $person had been pondering the idea of simply ripping out Xen's MCE implementation and replacing it with minimally adjusted Linux MCE implementation. What you describe matches my thinking. Even though the EDAC hardware is fully attached to processors now, it doesn't need virtualization similar to page tables. Instead EDAC should be handled similar to most hardware devices and go through Domain 0. The approach for Xen should also differ. Instead of first telling the OS, it might be better to immediately unmap the page and trigger a page fault if it is accessed. Then notify the OS a page has disappeared. Mainly immitate how Linux handles MCE events for a userspace process, rather than the usual paravirtualization. I'm not on sufficiently intimate terms with the drivers or hardware to try this right now. Yet the number of complaints about this is rather substantial (okay, I'm aware since this is no small concern for me too). -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@xxxxxxx PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |