[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: MCA/MCE concept



On Wednesday 30 May 2007 17:03:55 Petersson, Mats wrote:
> [snip]
>
> > My feeling is that the hypervisor and dom0 own the hardware
> > and as such
> > all hardware fault management should reside there.  So we should never
> > deliver any form of #MC to a domU, nor should a poll of MCA state from
> > a domU ever observe valid state (e.g, make the RDMSR return 0).
> > So all handling, logging and diagnosis as well as hardware
> > response actions
> > (such as to deploy an online spare chip-select) are controlled
> > in the hypervisor/dom0 combination.  That seems a consistent
> > model - e.g.,
> > if a domU is migrated to another system it should not carry the
> > diagnosis state of the original system across etc, since that
> > belongs with
> > the one domain that cannot migrate.
>
> I agree entirely with this.
>
> > But that is not to say that (I think at a future phase) domU
> > should not
> > participate in a higher-level fault management function, at
> > the direction
> > of the hypervisor/dom0 combo.  For example if/when we can isolate an
> > uncorrectable error to a single domU we could forward such an event to
> > the affected domU if it has registered its ability/interest in such
> > events.  These won't be in the form of a faked #MC or anything,
> > instead they'd be some form of synchronous trap experienced when next
> > the affected domU context resumes on CPU.  The intelligent
> > domU handler
> > can then decide whether the domU must panic, whether it could simply
> > kill the affected process etc.  Those details are clearly
> > sketchy, but the
> > idea is to up-level the communication to a domU to be more like
> > "you're broken" rather than "here's a machine-level hardware error for
> > you to interpret and decide what to do with".
>
> Yes, this makes much more sense than forwarding #MC, as the guest would
> have a hard time to actually do anything really useful with this. As far
> as I know, most uncorrectable errors are near enough entirely fatal in
> most commercial non-Enterprise OS's anyways - e.g. in Windows XP or
> Server 2K3, it always ends in a blue-screen - which is hardly any better
> than the guest being "humanely euthenazed" by Dom0.
>
> I take it this would be some sort of hypercall (available through the
> regular PV-driver interface for HVM guests) to say "Let me know if I'm
> broken - trap on vector X".

For short, guests with a PV MCA driver will see a certain event
(assuming the event mechanism will be used for the notification)
and guests w/o a PV MCA driver will see a "General Protection Fault".
Is that right?

> --
> Mats
>
> > Gavin
> >

-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.