[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] RFC: MCA/MCE concept
Hello! The current MCA/MCE support in Xen is that it dumps the error and panics. In the following concept I propose here, there are two places where Xen has to react on. I) Xen receives a MCE from the CPU and II) Xen receives Dom0 instructions via Hypercall The term "self-healing" below is used in the sense of using the most propriate technique(s) to handle an error such as MPR (http://www.opensparc.net/pubs/papers/MPR_DSN06.pdf), online-spare RAM or killing/restarting of impacted processes to prevent crashes of whole guests or the whole machine. case I) - Xen reveives a MCE from the CPU 1) Xen MCE handler figures out if error is an correctable error (CE) or uncorrectable error (UE) 2a) error == CE: Xen notifies Dom0 if Dom0 installed an MCA event handler for statistical purpose 2b) error == UE and UE impacts Xen or Dom0: Xen does some self-healing and notifies Dom0 on success if Dom0 installed MCA event handler or Xen panics on failure 2c) error == UE and UE impacts DomU: In case of Dom0 installed MCA event handler: Xen notifies Dom0 and Dom0 tells Xen whether to also notify DomU and/or does some operations on the DomU (case II) In case Dom0 did not install MCA event handler, Xen notifies DomU 3a) DomU is a PV guest: if DomU installed MCA event handler, it gets notified to perform self-healing if DomU did not install MCA event handler, notify Dom0 to do some operations on DomU (case II) if neither DomU nor Dom0 did not install MCA event handlers, then Xen kills DomU 3b) DomU is a HVM guest: if DomU features a PV driver then behave as in 3a) if DomU enabled MCA/MCE via MSR, inject MCE into guest if DomU did not enable MCA/MCE via MSR, notify Dom0 to do some operations on DomU (case II) if neither DomU enabled MCA/MCE nor Dom0 did not install MCA event handler, Xen kills DomU case II) - Xen reveives Dom0 instructions via Hypercall There are different reasons, why Xen should do something. - Dom0 got enough CEs so that UEs are very likely to happen in order to "circumvent" UEs. - Possible operations on a DomU - save/restore DomU - (live-)migrate DomU to a different physical machine - etc. Some details MCE When an MCE occures, then all the stuff above should NOT happen within the handler, because when an MCE happens within the MCE handler, then the CPU enters shutdown state. So the mail topic "NMI deferal on i386" may be related here. Notifying guests Above I am talking about MCA event handler. What I actually mean is a way to inform the guest something happened. I choosed the term "MCA event handler", because I think, using the event mechanism fits best for this purpose. Regarding HVM guests with no "MCA PV driver", can enable/disable certain types of errors. They can even control if tehy want to get an exception or do polling. I would prefer to always inject exceptions into the HVM guest. A HVM guest can't prevent when it always see's exceptions, but I know if they behave correctly, when they assume to get all or certain errors via polling. Guests which already feature fault management to a certain level when running non-virtualized can easily re-use this capability to decode the error telemetry and handle the error in the virtualized case. Thus forwarding/injecting the error into a guest will only require the translation of the physical/virtual address reported by the HW into guest physical/guest virtual addresses. The error code itself needs no translation/abstraction. self-healing IMO, only Xen should use the HW features such as online-spare RAM, which has been introduced in AMD K8 RevF. The HW features should never be visible to any DomUs in order to reduce complexity in Xen. Software-only techniques such as MPR are ok in all guests. Only the Dom0 can tell Xen to do something using HW features. Christoph -- AMD Saxony, Dresden Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |