Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/HVM: correctly deal with benign exceptions when combining two

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 11 Apr 2019 01:31:21 -0600

Cc: Juergen Gross <jgross@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 11 Apr 2019 07:33:29 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 10.04.19 at 01:58, <andrew.cooper3@xxxxxxxxxx> wrote: > Across the instruction writeback boundary (priority 10 around to 1), > PENDING_DBG persists, and feeds into exception considerations at > priority 2 and 4. This is why they manifest as traps, but they can be > equally thought of as faults before the fetch of the next instruction. Ah yes, this is perhaps a helpful way of looking at it, in particular wrt behavior of S/G insns running into other than #DB after having successfully completed at least one element. >> That's also the way the XSA-156 advisory describes it. > > XSA-156 was written at a time when we both had far less authority to > comment on the specific details. Certainly as far as I am concerned, > the past couple of years have made a massive difference, not least the > several months spent debugging the STI/singlestep issue with Gil. > > The mistake in XSA-156 is the description of "upon completion of the > delivery of the first exception". > > The infinite loop occurs because delivery of the first #DB is never > considered complete (as #DB is still considered pending once the > exception frame was written, because it was triggered in the process of > delivering the exception), and therefore does not move from priority 4 > to 5, which would allow an NMI to break the cycle. > > Similarly for the #AC case, priority never moves from 10 back to 1, > because delivery of the first #AC is never seen to have completed. To be honest, to me this continues to be a (mis-)implementation detail. A proper architectural specification would not allow for such pathological cases in the first place. >>> For VT-x, we're supposed to fill in the PENDING_DBG control rather than >>> raise #DB directly, explicitly to allow the priority of interrupts to be >>> expressed correctly. There is a very large quantity of untangling >>> working to do, and very clear I'm not going to have time to fix even the >>> STI/Singlestep issue in the 4.12 timeframe. >> Are you saying there need to be any vendor specific adjustments >> to this function then? I would very much hope that the code here >> could remain vendor independent, with vendor specific adjustments >> done in vendor specific code, and independently of the proposed >> change here. > > I think the better course of action is for {vmx,svm}_inject_event() to > gain adjustments to their #DB handling. > > The PENDING_DBG control is specifically an accumulation of various > debugging conditions, which will cause #DB traps to be delivered at the > appropriate point early in the next instruction cycle. Agreed - this would also pave the road to actually support raising data #DB from emulated insns. >>>> Sadly neither AMD nor Intel really define what happens with two benign >>>> exceptions - the term "sequentially" used by both is poisoned by how the >>>> combining of benign and non-benign exceptions is described. Since NMI, >>>> #MC, and hardware interrupts are all benign and (perhaps with the >>>> exception of #MC) can't occur second, favor the first in order to not >>>> lose it. >>> #MC has the highest priority so should only be recognised immediately >>> after an instruction boundary. >> Are you sure? What about an issue with one of the memory >> accesses involved in delivering a previously raised exception? > > #MC is abort, and is imprecise. Mind me correcting this to "may be imprecise": There's a flag after all telling whether in fact it is. > It bears no relationship to the content > of the iret frame it is observed with. This is why details of the issue > are stored in MSRs and not on the stack - the CPU can be hundreds of > instructions further on before an L3 cache/IIO #MC propagates back > through the uncore. > > Despite being classified as an exception, #MC behaves much more like an > NMI from the OS point of view, except that getting two of them > back-to-back is totally fatal. > >> >>> I don't however see a way of stacking #AC, because you can't know that >>> one has occured until later in the instruction cycle than all other >>> sources. What would happen is that you'd raise #AC from previous >>> instruction, and then recognise #MC while starting to execute the #AC >>> entry point. (I think) >> Well - see XSA-156 for what hardware does in that situation. > > The details of XSA-156 are inaccurate. > > #AC can stack, but the problem only manifests when Xen emulates an > injection, which is restricted to SVM for the moment. > > That said, I'm considering moving it back to being common to provide > architectural behaviour despite the silicon issue which causes XSA-170 Would you mind helping me make the connection between #AC delivery (and its emulation) and XSA-170, being about VM entry with non-canonical %rip? >> Irrespective of all of the above - what am I to take from your >> response? I.e. what adjustments (within the scope of this patch) >> do you see necessary for the change to become acceptable? > > By and large, I think the actual code changes are ok. They are a > general improvement, but I am not comfortable with the factual > inaccuracies in the commit message and the comments. > > I will see if I can rephrase it suitably. I appreciate this, first and foremost because despite all of your explanations I'm afraid I still can't see the inaccuracies there. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.