Xen project Mailing List

Re: [Xen-devel] Woes of NMIs and MCEs, and possibly how to fix

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 03 Dec 2012 08:26:44 +0000

Cc: Tim Deegan <tim@xxxxxxx>, Keir Fraser <keir@xxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Mon, 03 Dec 2012 08:27:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 30.11.12 at 18:34, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > 1) Faults on the NMI path will re-enable NMIs before the handler > returns, leading to reentrant behaviour. We should audit the NMI path > to try and remove any needless cases which might fault, but getting a > fault-free path will be hard (and is not going so solve the reentrant > behaviour itself). > > 2) Faults on the MCE path will re-enable NMIs, as will the iret of the > MCE itself if an MCE interrupts an NMI. As apparently converged to later in the thread - we just need to exclude the potential for faults inside the NMI and MCE paths. The only reason I could this needing to change would be if we intended to add an extensive NMI producer like native Linux'es perf subsystem. > 3) SMM mode executing an iret will re-enable NMIs. There is nothing we > can do to prevent this, and as an SMI can interrupt NMIs and MCEs, no > way to predict if/when it may happen. The best we can do is accept that > it might happen, and try to deal with the after effects. I don't see us needing to deal with that in any way. SMM using IRET carelessly is just plain wrong. Iirc SMM (just like VMEXIT) has a save/ restore field for the NMI mask, so if they make proper use of this, there should be no problem. > 4) "Fake NMIs" can be caused by hardware with access to the INTR pin > (very unlikely in modern systems with the LAPIC supporting virtual wire > mode), or by software executing an `int $0x2`. This can cause the NMI > handler to run on the NMI stack, but without the normal hardware NMI > cessation logic being triggered. > > 5) "Fake MCEs" can be caused by software executing `int $0x18`, and by > any MSI/IOMMU/IOAPIC programmed to deliver vector 0x18. Normally, this > could only be caused by a bug in Xen, although it is also possible on a > system with out interrupt remapping. (Where the host administrator has > accepted the documented security issue, and decided still to pass-though > a device to a trusted VM, and the VM in question has a buggy driver for > the passed-through hardware) Fake exceptions, as was also already said by others, are a Xen or hardware bug and hence shouldn't need extra precautions either. > 9) The NMI handler when returning to ring3 will leave NMIs latched, as > it uses the sysret path. This is a little imprecise: The problem is only when entering the scheduler on the way out of an NMI, and resuming an unaware PV vCPU on the given pCPU. Apart from forcing an IRET in that case early (we can't be on the special NMI stack in that case, as the NMI entry path switches to the normal stack when entered from PV guest context, entry from VMX context happens on the normal stack anyway, and entry from hypervisor context [which includes the SVM case] doesn't end up handling softirqs on the exit path), another option would be to clear the TRAP_syscall flag when resuming a PV vCPU in the scheduler. But the early IRET solution has other benefits (keeping the NMI disabled window short), so would be preferable imo. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.